OLE object model

D

Dan

I have written a VB program to read Word documents using Word.Application. The problem is that it is extremely slow. My program is simply looping through the paragraphs in the main story in a sample Word document and writing them to a text file. I am running Windows XP Professional on a 2.2 GHz Dell system with Word 2000. My test file is 77 KB, of which the main story object is about half of the file size. It takes 9 minutes to loop through the 262 paragraphs and output them to a text file. Am I doing something wrong that it should take so long?

To put it in perspective, my program also includes the capability to convert WordPerfect documents. If this were a WordPerfect file, this same conversion to a text file would take 10-20 seconds at the most.

Any suggestions or help would be appreciated. Thank you!
 
W

Word Heretic

G'day Dan <[email protected]>,

a) I have a Word VBA Beginner's Spellbook for sale from my site with
many optimisation techniques as part of a final proofing list I use
for my development program.

b) Use With judiciously to avoid object reload.


Dan said:
I have written a VB program to read Word documents using Word.Application. The problem is that it is extremely slow. My program is simply looping through the paragraphs in the main story in a sample Word document and writing them to a text file. I am running Windows XP Professional on a 2.2 GHz Dell system with Word 2000. My test file is 77 KB, of which the main story object is about half of the file size. It takes 9 minutes to loop through the 262 paragraphs and output them to a text file. Am I doing something wrong that it should take so long?

To put it in perspective, my program also includes the capability to convert WordPerfect documents. If this were a WordPerfect file, this same conversion to a text file would take 10-20 seconds at the most.

Any suggestions or help would be appreciated. Thank you!

Steve Hudson

Word Heretic, Sydney, Australia
Tricky stuff with Word or words for you.
wordheretic.com

If my answers r 2 terse, ask again or hassle an MVP,
at least they get recognition for it then.
Lengthy replies offlist require payment.
 
J

Jay Freedman

Dan said:
I have written a VB program to read Word documents using Word.Application. The problem is that it is extremely slow. My program is simply looping through the paragraphs in the main story in a sample Word document and writing them to a text file. I am running Windows XP Professional on a 2.2 GHz Dell system with Word 2000. My test file is 77 KB, of which the main story object is about half of the file size. It takes 9 minutes to loop through the 262 paragraphs and output them to a text file. Am I doing something wrong that it should take so long?

To put it in perspective, my program also includes the capability to convert WordPerfect documents. If this were a WordPerfect file, this same conversion to a text file would take 10-20 seconds at the most.

Any suggestions or help would be appreciated. Thank you!

Since you don't show the code you're using now, it's a little
difficult to recommend specifics, but here are some things that may
help:

Don't use an index into the Paragraphs collection like this:

Dim i As Integer
For i = 1 To ActiveDocument.Paragraphs.Count
' do something with ActiveDocument.Paragraphs(i)
Next i

Every time you refer to ActiveDocument.Paragraphs(i), this syntax
makes Word count from 1 to i to find the right paragraph. Instead, do
something like this:

Dim aPara As Paragraph
For Each aPara in ActiveDocument.Paragraphs
' do something with aPara
Next aPara

If you're using the .Select method to select a paragraph and then
working with the Selection object, don't do that either. This forces
Word to scroll and redraw the screen, which is very slow. Instead,
work with aPara.Range -- for example, writing out aPara.Range.Text to
the text file.
 
D

Dan

Steve

I went to your Web site and ordered a copy of your Word VBA Spellbook through PayPal. I also sent an e-mail message to the address given on your site asking what I had to do to get your product. I didn't know if I had to initiate something to get the book downloaded to me. Today I received a message from PayPal saying that my payment had not been picked up. Have you received anything from me or PayPal about this

Sincerely
Da

----- Word Heretic wrote: ----

G'day Dan <[email protected]>

a) I have a Word VBA Beginner's Spellbook for sale from my site wit
many optimisation techniques as part of a final proofing list I us
for my development program

b) Use With judiciously to avoid object reload


Dan said:
I have written a VB program to read Word documents using Word.Application. The problem is that it is extremely slow. My program is simply looping through the paragraphs in the main story in a sample Word document and writing them to a text file. I am running Windows XP Professional on a 2.2 GHz Dell system with Word 2000. My test file is 77 KB, of which the main story object is about half of the file size. It takes 9 minutes to loop through the 262 paragraphs and output them to a text file. Am I doing something wrong that it should take so long

Steve Hudso

Word Heretic, Sydney, Australia
Tricky stuff with Word or words for you
wordheretic.co

If my answers r 2 terse, ask again or hassle an MVP
at least they get recognition for it then
Lengthy replies offlist require payment
 
D

Dan

Jay,

Thank you for your helpful suggestions. I used them to make a few modifications to my code. Here is what it looks like at this point:

Dim CharObj As Word.Range
Dim MainStory As Word.Range
Dim PgfObj As Word.Paragraph
Dim PgfRange As Word.Range
Dim TextChar As String
Dim TextStr as String

Set MainStory = ActiveDocument.Content

With MainStory
For Each PgfObj In .Paragraphs
With PgfObj
Set PgfRange = ActiveDocument.Range(Start:=.Range.Start, End:=.Range.End)

With PgfRange
For Each CharObj In .Characters
With CharObj
TextChar = .Text
If TextChar <> Chr(13) And TextChar <> Chr(10) Then _
TextStr = TextStr & TextChar
End With
Next CharObj
End With
End With
Next PgfObj
End With

Set CharObj = Nothing
Set MainStory = Nothing
Set PgfObj = Nothing

It still seems to execute rather slowly to me. On a 2 GHz Dell laptop it takes about 3 minutes to loop through a 40 KB file. Can you see anything else I could do to further optimize the code so it will run faster? Thanks!

Sincerely,
Dan

----- Jay Freedman wrote: -----

Dan said:
I have written a VB program to read Word documents using Word.Application. The problem is that it is extremely slow. My program is simply looping through the paragraphs in the main story in a sample Word document and writing them to a text file. I am running Windows XP Professional on a 2.2 GHz Dell system with Word 2000. My test file is 77 KB, of which the main story object is about half of the file size. It takes 9 minutes to loop through the 262 paragraphs and output them to a text file. Am I doing something wrong that it should take so long?

Since you don't show the code you're using now, it's a little
difficult to recommend specifics, but here are some things that may
help:

Don't use an index into the Paragraphs collection like this:

Dim i As Integer
For i = 1 To ActiveDocument.Paragraphs.Count
' do something with ActiveDocument.Paragraphs(i)
Next i

Every time you refer to ActiveDocument.Paragraphs(i), this syntax
makes Word count from 1 to i to find the right paragraph. Instead, do
something like this:

Dim aPara As Paragraph
For Each aPara in ActiveDocument.Paragraphs
' do something with aPara
Next aPara

If you're using the .Select method to select a paragraph and then
working with the Selection object, don't do that either. This forces
Word to scroll and redraw the screen, which is very slow. Instead,
work with aPara.Range -- for example, writing out aPara.Range.Text to
the text file.
 
M

Martin Seelhofer

Hi Dan

From what I see in your code, you might as well just use the following
two lines of code in place of your many ones (no need for looping):

TextStr = Replace(ActiveDocument.Content.Text, vbCr,"")
TextStr = Replace(TextStr, vbLf,"")

Or am I missing something?


Cheers,

Martin

Dan said:
Jay,

Thank you for your helpful suggestions. I used them to make a few
modifications to my code. Here is what it looks like at this point:
Dim CharObj As Word.Range
Dim MainStory As Word.Range
Dim PgfObj As Word.Paragraph
Dim PgfRange As Word.Range
Dim TextChar As String
Dim TextStr as String

Set MainStory = ActiveDocument.Content

With MainStory
For Each PgfObj In .Paragraphs
With PgfObj
Set PgfRange = ActiveDocument.Range(Start:=.Range.Start, End:=.Range.End)

With PgfRange
For Each CharObj In .Characters
With CharObj
TextChar = .Text
If TextChar <> Chr(13) And TextChar <> Chr(10) Then _
TextStr = TextStr & TextChar
End With
Next CharObj
End With
End With
Next PgfObj
End With

Set CharObj = Nothing
Set MainStory = Nothing
Set PgfObj = Nothing

It still seems to execute rather slowly to me. On a 2 GHz Dell laptop it
takes about 3 minutes to loop through a 40 KB file. Can you see anything
else I could do to further optimize the code so it will run faster? Thanks!
Sincerely,
Dan

----- Jay Freedman wrote: -----
Word.Application. The problem is that it is extremely slow. My program is
simply looping through the paragraphs in the main story in a sample Word
document and writing them to a text file. I am running Windows XP
Professional on a 2.2 GHz Dell system with Word 2000. My test file is 77 KB,
of which the main story object is about half of the file size. It takes 9
minutes to loop through the 262 paragraphs and output them to a text file.
Am I doing something wrong that it should take so long?to convert WordPerfect documents. If this were a WordPerfect file, this same
conversion to a text file would take 10-20 seconds at the most.
 
J

Jay Freedman

Hi, Dan,

Martin is absolutely correct. Going through the whole file character
by character is the most inefficient method possible.

The only further comment I have is that the Replace function was first
introduced in Word 2000, so anyone who's using Word 97 won't be able
to use Martin's code. Even there, it would be quicker to (a) make a
temporary copy of the document, (b) do two Find/Replace operations to
replace vbCr and vbLf with nothing, (c) set TextStr =
TempDoc.Content.Text, and (d) close the temp doc without saving it.
 
D

Dan

Jay

Thanks to both you and Martin for your helpful feedback. I understand how your code is much simpler than the looping through each character that my code is doing

I should have included a note in my previous message that the reason I am looping through character by character is that there is more to my code than what I've shown. I need to be able to know when formatting features such as bold and italic start and end. Is there any way to use the simpler code that you've described and still be able to reference formatting features that are included in the paragraph object

Again, many thanks for your help

Sincerely
Dan
 
J

Jay Freedman

Hi, Dan,

To some extent, it depends on what you want to do with the formatting
information. For example, if you want to insert HTML-like tags in the text
stream, you can do that with Find/Replace operations like this sample:

Dim oRg As Range
Set oRg = ActiveDocument.Range
With oRg.Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = True
.Font.Italic = True
.Text = ""
.Replacement.Text = "<italic>^&</italic>"
.Forward = True
.Wrap = wdFindStop
.Execute Replace:=wdReplaceAll
End With

[Note that the code ^& in the replacement means "the text that was found".]

If you just want to grab each chunk of text that has the same formatting,
you should store the .Start and .End values of the ranges you find with
format Finds.
 
D

Dan

Jay,

Again, thanks for your help. Is there a book that you would recommend that would describe in detail the kinds of programming techniques that you put in your last message? I don't do well learning concepts from help files that come with an application; I use the online help for details, but when it comes to overall concepts, I find it much easier and quicker to read a printed book.

Sincerely,
Dan
 
J

Jay Freedman

Hi, Dan,

The only book I have on VBA is Guy Hart-Davis, "Word 97 Macro & VBA
Handbook". The treatment of Find/Replace in Chapter 12 is not bad, if
a bit superficial. That book is now out of print, replaced by "Word
2000 Developer's Handbook", which I haven't seen but which has fairly
good reviews on Amazon.com.

There are other VBA books on the market, but I haven't looked at any
of them so I won't offer any opinions.

You can also look at the articles in the Macros/VBA section of
www.mvps.org.
 
D

Dan

Jay

I inserted your suggestion into my program for replacing italics with <italic> and </italic> markers. It works fine. I then tried to change it to do the same for bold, changing only the .Font.Italic and .Replacement.Text lines, and my program gets into an infinite loop. Is there something I need to do differently for bold

Sincerely
Dan
 
J

Jay Freedman

All you should need to change is

.Font.Bold = True
and
.Replacement.Text = "<bold>^&</bold>"

There isn't any loop at all in the macro, so there shouldn't be any way to
get it into an infinite loop. If it seems that the macro stops responding,
press Ctrl+Break and tell me what line in the code is highlighted.
 
D

Dan

I am executing my program from within VB. When it appears to be in an infinite loop and I press Ctrl-Break, it highlights the End With statement at the end of the With .Find block.

If I try to re-execute the program, Word has been left in an open state from the previous execution, so the program behaves differently. Sometimes it will make the Word document visible on the screen, and then it appears that it is putting "<bold>" markers repeatedly at the beginning of a line of bold text, as though it's not really replacing the .Font.Bold coding with <bold>, but is somehow leaving the .Font.Bold coding in the document so that it's available for another replace, and another, etc.

Dan
 
J

Jay Freedman

Well, it's true that the macro doesn't *replace* the formatting with the
tags, it just adds the tags. If you want to get rid of the bold formatting
at the same time you add the tags, add the second line below:

.Replacement.Text = "<bold>^&</bold>"
.Replacement.Font.Bold = False

Even without that, though, when I run the same routine from VBA (not VB), it
replaces all the bold occurrences just once and stops (because of the
wdReplaceAll parameter in the .Execute).

You aren't by any chance calling the whole Find/Replace repeatedly from VB?
If so, don't -- just call it once.
 
D

Dan

Adding the line .Replacement.Font.Bold = False made the program execute to a normal termination. As far as I know, I'm not executing it repeatedly. It's exactly the same code that works fine with italics

What I found, now that it executes properly, is that every time there is a piece of bold text, I get 52 <bold> markers before it and 52 </bold> markers after it. What would cause it to come up with more than one, and why would it be 52 every time

Dan
 
J

Jay Freedman

Dan said:
Adding the line .Replacement.Font.Bold = False made the program
execute to a normal termination. As far as I know, I'm not executing
it repeatedly. It's exactly the same code that works fine with
italics.

What I found, now that it executes properly, is that every time there
is a piece of bold text, I get 52 <bold> markers before it and 52
</bold> markers after it. What would cause it to come up with more
than one, and why would it be 52 every time?

Dan

Ummm... gremlins? As I said, I ran the routine within VBA and it did what I
expected -- one tag before and after each occurrence.

As for why it's 52... how many separate occurrences of bold text are there
in the original document? Was it 52? What happens if you run the same code
on a document with, say, 5 bold phrases?

Maybe if you post the VB code you're using, I can spot something.
 
D

Dan

Jay,

I guess we'll have to chalk it up to gremlins. I just reran the program and it only put in one <bold> and one </bold> for each occurrence. So I'm going to assume that it's working as I'd like for now.

I did order a copy of "Word 2000 Developer's Handbook" today, so hopefully I won't have to bother you as much in the future! Thanks very much for all of your help.

Sincerely,
Dan
 
C

Cindy M -WordMVP-

Hi =?Utf-8?B?RGFu?=,
Is there a book that you would recommend that would describe in detail the kinds of programming techniques
My personal preference is the Word 2000 VBA Programmer's Reference from Wrox publishing. I find that author
does a better job explaining and working with the Word object model than others I've seen.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Sep 30 2003)
http://www.mvps.org/word

This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by
e-mail :)
 
S

Steve Hudson

G'day Dan <[email protected]>,

No, but my email was down during the period (change of addy, install
probs galore etc). I did check paypal the moment I was back on & there
was 0 there - the notify failure must have kicked off an automatic
deny.

You can contact me direct using my business name WordHeretic at my new
ISP tpg.com.au if you need to. My book contains MANY tricks for
optimising and 'improving' your code.



Dan said:
Steve,

I went to your Web site and ordered a copy of your Word VBA Spellbook through PayPal. I also sent an e-mail message to the address given on your site asking what I had to do to get your product. I didn't know if I had to initiate something to get the book downloaded to me. Today I received a message from PayPal saying that my payment had not been picked up. Have you received anything from me or PayPal about this?

Sincerely,
Dan

----- Word Heretic wrote: -----

G'day Dan <[email protected]>,

a) I have a Word VBA Beginner's Spellbook for sale from my site with
many optimisation techniques as part of a final proofing list I use
for my development program.

b) Use With judiciously to avoid object reload.




Steve Hudson

Word Heretic, Sydney, Australia
Tricky stuff with Word or words for you.
wordheretic.com

If my answers r 2 terse, ask again or hassle an MVP,
at least they get recognition for it then.
Lengthy replies offlist require payment.

Steve Hudson

Word Heretic, Sydney, Australia
Tricky stuff with Word or words for you.
wordheretic.com

If answers r 2 terse, ask again or hassle an MVP,
at least they get recognition for it then.
Lengthy replies offlist require payment.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top