Finding word position (start/end) in a word document (II)

Fernando Cabral · Jul 14, 2006

About my previous question (finding word position (start/end) in a word
document),
a very simple solution would be an assigment that could replace the one
I am using. That is, instead of

s = ActiveDocument.Content.Text ' which does not work

something like

s = ActiveDocument.Content.TextWithEnbeddedStuff

If such a property/method exists, than my program would work
with this single modification.

Thank you

- fernando

Dave Lett · Jul 14, 2006

Hi Fernando,

You might be able to use something like the following, which is more
efficient:
Dim lStart As Long
Dim lEnd As Long
Dim oWrd
For Each oWrd In ActiveDocument.Words
lStart = oWrd.Start
lEnd = oWrd.End
Next oWrd

HTH,
Dave

Fernando Cabral · Jul 15, 2006

Dave

You solution worked much better then mine. Time was down from about
one hour to 1 minute, 3 seconds, 141 thounds of a second.
Nevertheless, it is still too slow for any practical ("real-time" application.
Also, it still compares very poorly with my original solution that
does the same thing in about 4 seconds. (But it canÂ´t be used
as a generic solution because it fails when text has enbedded imagens.)

There must be a faster solution. Otherwise spellchecker/find/replace
and the like would be too slow too be acceptable.

But your solution was very useful, anyway. On one hand, I can
use it for other applications. On the other hand, if I canÂ´t
find an alternative, I still can have a solution that works during lunch time.
That is, I start it before lunch and get the results after lunch

Thank you.

- fernando

Helmut Weber · Jul 15, 2006

Hi Fernando

it seems (!), that every object other than
an ordinary text object (word, character, etc ...)
in the selection or in a range is represented by chr(1).

Like:
MsgBox InStr(selection.Range.Text, Chr(1))

which would return x, the position of
an inlineshape in the range, for example.

I don't know, whether this can help you.

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"

Fernando Cabral · Jul 15, 2006

Helmut and Klaus

I put your suggestion (as well as DaveÂ´s) to good use.
They helped me shrink processing time from about one hour
to about 1 minute. ThatÂ´s still too much, but much closer to
what I need (I need something that does the whole thing in
less than one minute. This leaves about 4 or 5 seconds for
the word-collecting phase)

Now, to the specifics:

it seems (!), that every object other than
an ordinary text object (word, character, etc ...)
in the selection or in a range is represented by chr(1).

I found 28 positions for a drawing. See the dump bellow:

0,27:^A
27,28:^M
28,29:^M
29,41

RESIDÃŠNCIA
41,44

A
44,53:REPÃšBLICA
53,54:^M

^A is chr(1). It takes 28 positions. (from 0 to 27).
Also, each word occupies two positions more. See PRESIDÃŠNCIA. It is
11-character long, but goes from 29 to 41. I stil canÂ´t understand why.

See a dump from the TOC:

1950,1972:ApresentaÃ§Ã£o
1972,1973:^I
1973,2006:VIII
2006,2007:^M
2007,2014:Sinais
2014,2016:e
2016,2029:Abreviaturas
2029,2039:Empregados
2039,2040:^I
2040,2071:IX

VIII and IX are page numbers. They occupy 31 positions. Nevertheless,
I canÂ´t see where they are. The same happens if page number is in arabic,
as seen for the positions consumed by 2 (2111,2141):

2089,2102:COMUNICAÃ‡Ã•ES
2102,2110:OFICIAIS
2110,2111:^I
2111,2141:2
2141,2142:^M
2142,2151:CAPÃTULO

Still, the solution you guys suggested take care of those issues. They word
even if I canÂ´t understand why those gaps are there.

If I canÂ´t find a faster way to do it, IÂ´ll have to make do with with
a few minutes instead of the expected few seconds. :-(

Thank you guys. (if you still have ideas I can try, keep sending them)

- fernando

Fernando Cabral · Jul 16, 2006

HereÂ´s how I solved the problem:

Where I had

s=activedocument.content.text

I now have

GetTheWholeThing (s)

The sub being:

Sub GetTheWholeThing(ByRef s As String)
Dim oWrd As Word.Range

s = Space(ActiveDocument.Characters.Last.end)

For Each oWrd In ActiveDocument.Words
Mid(s, oWrd.start + 1) = oWrd.Text
Next oWrd
End Sub

It works. The document that gave me a headache (the largest
and most complex one) is now being processed in 65 seconds.
Barely acceptable. On the bright side, the whole program
(more than 2000 lines) is still working as it used to when the
documents didnÂ´t have images, TOC and other stuffs that
introduce "phantom" spaces that you can not reach using
activedocument.content.text...

I still hope for some improvement...

Thank you all, guys.

- fernando

Klaus Linke · Jul 16, 2006

Yes, the Chr(1) are objects.
The ^M are paragraph marks Chr(13), the ^I tabs Chr(9)?

The page numbers are probably fields. If you turn on field codes in Tools >
Options > View (or Alt+F9), you can see the field code.
The opening field brace is Chr(19) , the closing brace Chr(21) .

If you read the stuff into a string, you'll get either the field code (with
the braces), or the result, depending on whether you view field codes or
not.
If you go through the ranges, you'll see a range of length 1 for the opening
brace, then the field code, an empty range of length 1, the field result,
followed by another range of length 1 for the closing field brace.

The TOC is a field too, and may contain additional fields (say if you have
hyperlinks back to the headings).

Greetings,
Klaus

Klaus Linke · Jul 16, 2006

You could speed it up more, but it would take quite a bit of work.

If you know where the fields and object anchors are, you can "map" s1 =
ActiveDocument.Content.Text to the string you want to build (which is as
"long" as the range).
Then you don't need to get and place the words individually (Mid(s,
oWrd.start + 1) = oWrd.Text), which is what takes so long. Instead you could
build the new string completely with string functions from s1, which is
much, much quicker.

For that, you need to loop the fields and shapes and collect their positions
(which takes a bit of time, but is usually pretty quick).

Why I didn't mention tables? End-of-cell markers and end-of-row-markers take
two characters in the string, but only a range of length one each. But you
can remove that problem by simply replacing Chr(13) & Chr(7) with some
character of your choice.

One problem you may run into that there are lots of different kinds of
fields. Some have no Result, only Code, so you will have to treat those
differently.

It took me about a day to work it out, and I haven't really debugged the
code thoroughly. If you want, I can mail you what I have (... no promise
that it'll work reliably: You might be better off writing it yourself
instead of debugging mine).
Send me a mail if you are interested.

Regards,
Klaus

VBA word change keys	0	Mar 12, 2022
VBA - exporting serial letters to single documents	0	Oct 14, 2021
Finding word position (start/end) on a word document	3	Jul 14, 2006
Can't save my Office documents (after computer rebuild and reauthorization)	0	Jun 10, 2022
VBA to see if a Section exists in Word Document, and if it does, delete it	2	Nov 26, 2020
Read text from Word Documents, allowing for Field Codes, Tables,Shapes etc	0	Feb 18, 2014
Linking data in MS Word into Excel	5	Oct 17, 2018
Prompt For Replace in Selection And More...	0	Jul 21, 2021

Finding word position (start/end) in a word document (II)

Fernando Cabral

Dave Lett

Fernando Cabral

Helmut Weber

Fernando Cabral

Fernando Cabral

Klaus Linke

Klaus Linke

Ask a Question

Similar Threads