Finding word position (start/end) on a word document

Fernando Cabral · Jul 14, 2006

By doing
for i = 1 to activedocument.words(i).count
ActiveDocument.Words(i).Select
word(i).start = ActiveDocument.Words(i).start
word(i).end = ActiveDocument.Words(i).end
next i
I create an array with pointers to every word in a word document.
Problem: SLLLLLLOOOOOWWWWWW. It takes forever even for a "small"
document with (say) 300 pages.
I can do the same by first copying the whole text into a variable and then
tokening it. Say:

Dim s as string
s = activedocument.content.text
for i = 1 len(s)
word(i).start = NextToken(s).start
word(i).end = NextToken(s).end
next i

The second method is hundred times faster.
Problem arise when the document is not "plain" text. That is, it also contains
pictures, drawing, TOC, etc.

In this case each non-textual element adds an additional offset in the first
method, but not in the second. As we move towards the end of the text
the offset increases as we pass by each non-textual element.

Question: is there a way for me to get how many objects there are in the
text, where they are, how many bytes they take?

- fernando

Klaus Linke · Jul 15, 2006

Hi Fernando,

By doing
for i = 1 to activedocument.words(i).count
ActiveDocument.Words(i).Select
word(i).start = ActiveDocument.Words(i).start
word(i).end = ActiveDocument.Words(i).end
next i
I create an array with pointers to every word in a word document.
Problem: SLLLLLLOOOOOWWWWWW. It takes forever even for a "small"
document with (say) 300 pages.

Dim myWord as Range
For each myWord in ActiveDocument.Words
' Do something with myWord
Next myWord

would be quite a bit faster.
In your code above, Word has to locate Words(i) in each iteration by
counting words from the start, and that takes longer and longer.
There's also probably no reason to select anything... that only takes time.

I can do the same by first copying the whole text into a variable and then
tokening it. Say:

Dim s as string
s = activedocument.content.text
for i = 1 len(s)
word(i).start = NextToken(s).start
word(i).end = NextToken(s).end
next i

The second method is hundred times faster.
Problem arise when the document is not "plain" text. That is, it also
contains
pictures, drawing, TOC, etc.

In this case each non-textual element adds an additional offset in the
first
method, but not in the second. As we move towards the end of the text
the offset increases as we pass by each non-textual element.

Question: is there a way for me to get how many objects there are in the
text, where they are, how many bytes they take?

In principle: yes. There's one extra character for each shape anchor and
inline graphic/object, I think, two characters each for each table cell and
an additional 2 each for each table row. And then there is a character for
each field opening brace and closing brace, plus one between the field code
and the field result (both of which will be in the string).
It's doable, but not too easy. Depending on what you do, you might check
whether you have other options (say, work with range.XML or with HTML code
of the document to get at all the formatting and other stuff).

Regards,
Klaus

Klaus Linke · Jul 15, 2006

Oops, didn't see you had started another thread...

Klaus

cradino · Jul 16, 2006

Caro Fernando Cabral
Onde quer que esteja compreende este texto. Quer por solidariedade lusa
dar-me um pequeno apoio na compreensÃ£o do que aqui se tratou nesta
conversaÃ§Ã£o.
Isto nÃ£o Ã© uma resposta mas sim um pedido de ajuda.
Pedia-lhe que "vestisse" o macro de forma que eu o possa usar e perceber-lhe
o sentido ... sem muito trabalho! Obrigado
Cumprimentos
Arcindo Lucas

"Klaus Linke" escreveu:

Word VBA - Delete text from bulleted paragraphs	1	Feb 2, 2021
Help needed! Extracting Heading 3 and Contents from a Word Doc to Excel	0	Dec 21, 2021
VBA - exporting serial letters to single documents	0	Oct 14, 2021
Macro to count words in a section, excluding footnotes and end notes	1	Jan 2, 2017
Selecting the whole word at the cursor position	0	Feb 13, 2009
Display bug in Word: tables disappearing on copy and paste of text with bullet points into table	2	May 11, 2019
Page numbering with Merged Document and IF statements	1	Jan 19, 2022
Finding word position (start/end) in a word document (II)	7	Jul 14, 2006

Finding word position (start/end) on a word document

Fernando Cabral

Klaus Linke

Klaus Linke

cradino

Ask a Question

Similar Threads