Very Slow character counting in Word 2003

E

Edgar E. Cayce

I am writing a character counting application which needs to go
through hundreds of Word documents, counting the characters in each
one. I need to do things like count double for bold/italic/underline
and count font transitions, etc., so just asking Word for a char count
does not do it for me.

When the bit of code that does the actual counting executes, it is
awfully slow - like 10-20 characters per second.

The app that calls this is VBA from Access 2003.

What I do is get a range for each "story" in the document (body,
headers, footers, etc.) and run the following code on the range (note
that Counter is a structure I use to hold the various counts). I find
that it takes several seconds between each debug printing that it has
counted 100 chars:

Dim BoldState As Boolean
Dim UnderlineState As Boolean
Dim ItalicState As Boolean
Dim LastFontName As String
Dim LastFontSize As Long
Dim LastFontColor As Long
Dim CharCounter As Long
Dim TheChar As Object

BoldState = False
UnderlineState = False
ItalicState = False
LastFontName = CurrentRange.Characters(1).Font.Name
LastFontSize = CurrentRange.Characters(1).Font.Size
LastFontColor = CurrentRange.Characters(1).Font.Color

For CharCounter = 1 To CurrentRange.Characters.Count
If CharCounter Mod 100 = 0 Then Debug.Print CharCounter
With CurrentRange.Characters(CharCounter)
If .Text = " " Then
Counter.Spaces = Counter.Spaces + 1
ElseIf .Text = vbTab Then
Counter.Tabs = Counter.Tabs + 1
ElseIf .Text = vbCr Then
Counter.Returns = Counter.Returns + 1
Else ' else it is a char, the only one where we care about
bold, font, etc.
Counter.Chars = Counter.Chars + 1
If .Case = wdUpperCase Then
Counter.CapitalChars = Counter.CapitalChars + 1
End If

' note for these, we check transition as well as
presence.
If .Font.Bold Then
Counter.BoldChars = Counter.BoldChars + 1
If BoldState = False Then
Counter.BoldTransitions =
Counter.BoldTransitions + 1
End If
BoldState = True
Else
If BoldState = True Then
Counter.BoldTransitions =
Counter.BoldTransitions + 1
End If
BoldState = False
End If

If .Font.Underline Then
Counter.UnderlineChars = Counter.UnderlineChars +
1
If UnderlineState = False Then
Counter.UnderlineTransitions =
Counter.UnderlineTransitions + 1
End If
UnderlineState = True
Else
If UnderlineState = True Then
Counter.UnderlineTransitions =
Counter.UnderlineTransitions + 1
End If
UnderlineState = False
End If

If .Font.Italic Then
Counter.ItalicChars = Counter.ItalicChars + 1
If ItalicState = False Then
Counter.ItalicTransitions =
Counter.ItalicTransitions + 1
End If
ItalicState = True
Else
If ItalicState = True Then
Counter.ItalicTransitions =
Counter.ItalicTransitions + 1
End If
ItalicState = False
End If

If .Font.Name <> LastFontName Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontName = .Font.Name
End If
If .Font.Size <> LastFontSize Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontSize = .Font.Size
End If
If .Font.Color <> LastFontColor Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontColor = .Font.Color
End If

End If

End With
Next CharCounter


Can anyone help me figure out how to speed this up? I saw something
posted about how using Range.Characters(CharCounter) to index the
chars would be slow, so I tried a For Each [CharObj] in
Range.Characters instead, but it did not seem to be any faster - and
am I getting them in order when I do that?

Ed
 
D

DA

Don't know if you're already doing this, but try turning
off the screen updating during your routine.
Application.ScreenUpdating = False

Also have a look at this article:
http://word.mvps.org/faqs/interdev/MakeAppInvisible.htm
-----Original Message-----
I am writing a character counting application which needs to go
through hundreds of Word documents, counting the characters in each
one. I need to do things like count double for bold/italic/underline
and count font transitions, etc., so just asking Word for a char count
does not do it for me.

When the bit of code that does the actual counting executes, it is
awfully slow - like 10-20 characters per second.

The app that calls this is VBA from Access 2003.

What I do is get a range for each "story" in the document (body,
headers, footers, etc.) and run the following code on the range (note
that Counter is a structure I use to hold the various counts). I find
that it takes several seconds between each debug printing that it has
counted 100 chars:

Dim BoldState As Boolean
Dim UnderlineState As Boolean
Dim ItalicState As Boolean
Dim LastFontName As String
Dim LastFontSize As Long
Dim LastFontColor As Long
Dim CharCounter As Long
Dim TheChar As Object

BoldState = False
UnderlineState = False
ItalicState = False
LastFontName = CurrentRange.Characters(1).Font.Name
LastFontSize = CurrentRange.Characters(1).Font.Size
LastFontColor = CurrentRange.Characters(1).Font.Color

For CharCounter = 1 To CurrentRange.Characters.Count
If CharCounter Mod 100 = 0 Then Debug.Print CharCounter
With CurrentRange.Characters(CharCounter)
If .Text = " " Then
Counter.Spaces = Counter.Spaces + 1
ElseIf .Text = vbTab Then
Counter.Tabs = Counter.Tabs + 1
ElseIf .Text = vbCr Then
Counter.Returns = Counter.Returns + 1
Else ' else it is a char, the only one where we care about
bold, font, etc.
Counter.Chars = Counter.Chars + 1
If .Case = wdUpperCase Then
Counter.CapitalChars = Counter.CapitalChars + 1
End If

' note for these, we check transition as well as
presence.
If .Font.Bold Then
Counter.BoldChars = Counter.BoldChars + 1
If BoldState = False Then
Counter.BoldTransitions =
Counter.BoldTransitions + 1
End If
BoldState = True
Else
If BoldState = True Then
Counter.BoldTransitions =
Counter.BoldTransitions + 1
End If
BoldState = False
End If

If .Font.Underline Then
Counter.UnderlineChars = Counter.UnderlineChars +
1
If UnderlineState = False Then
Counter.UnderlineTransitions =
Counter.UnderlineTransitions + 1
End If
UnderlineState = True
Else
If UnderlineState = True Then
Counter.UnderlineTransitions =
Counter.UnderlineTransitions + 1
End If
UnderlineState = False
End If

If .Font.Italic Then
Counter.ItalicChars = Counter.ItalicChars + 1
If ItalicState = False Then
Counter.ItalicTransitions =
Counter.ItalicTransitions + 1
End If
ItalicState = True
Else
If ItalicState = True Then
Counter.ItalicTransitions =
Counter.ItalicTransitions + 1
End If
ItalicState = False
End If

If .Font.Name <> LastFontName Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontName = .Font.Name
End If
If .Font.Size <> LastFontSize Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontSize = .Font.Size
End If
If .Font.Color <> LastFontColor Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontColor = .Font.Color
End If

End If

End With
Next CharCounter


Can anyone help me figure out how to speed this up? I saw something
posted about how using Range.Characters(CharCounter) to index the
chars would be slow, so I tried a For Each [CharObj] in
Range.Characters instead, but it did not seem to be any faster - and
am I getting them in order when I do that?

Ed
.
 
K

Klaus Linke

Hi Edgar,

For CharCounter = 1 To CurrentRange.Characters.Count
' Do something with CurrentRange.Characters(CharCounter)
is a terribly slow way to loop characters.
For each character, Word has to count through all the characters from the
first up to CharCounter to locate that character.
In large documents, this will take a time that is proportional to the
square of the number of characters.

Somewhat faster:
For Each charLoop in CurrentRange.Characters
' Do something with charLoop
This will take a time that is proportional to the number of characters.
(After one character has been processed, Word will simply go to the next
one)
But if you need to access the properties for each character (font,
underline ...) it still takes a long time.
Word keeps track of which ranges are, say, underlined. It takes some time
to figure out whether some particular character is in some underlined
range.

Much faster -- Avoid accessing individual characters using "Find/Replace":
Take a count of characters.
Then use "Find/Replace", to delete, say, underlined characters, and get the
count of characters again.
Take the difference. Then undo the deletion.
Word already "knows" the ranges that are underlined (for example).
So "Find/Replace" doesn't need to look at individual characters, nor figure
out how some specific range/character is formatted.

See http://word.mvps.org/faqs/macrosvba/GetNoOfReplacements.htm for some
example code.


Regards,
Klaus


Edgar E. Cayce said:
I am writing a character counting application which needs to go
through hundreds of Word documents, counting the characters in
each one. I need to do things like count double for bold/italic/
underline and count font transitions, etc., so just asking Word for
a char count does not do it for me.

When the bit of code that does the actual counting executes, it is
awfully slow - like 10-20 characters per second.

The app that calls this is VBA from Access 2003.

What I do is get a range for each "story" in the document (body,
headers, footers, etc.) and run the following code on the range
(note that Counter is a structure I use to hold the various counts).
I find that it takes several seconds between each debug printing
that it has counted 100 chars:

Dim BoldState As Boolean
Dim UnderlineState As Boolean
Dim ItalicState As Boolean
Dim LastFontName As String
Dim LastFontSize As Long
Dim LastFontColor As Long
Dim CharCounter As Long
Dim TheChar As Object

BoldState = False
UnderlineState = False
ItalicState = False
LastFontName = CurrentRange.Characters(1).Font.Name
LastFontSize = CurrentRange.Characters(1).Font.Size
LastFontColor = CurrentRange.Characters(1).Font.Color

For CharCounter = 1 To CurrentRange.Characters.Count
If CharCounter Mod 100 = 0 Then Debug.Print CharCounter
With CurrentRange.Characters(CharCounter)
If .Text = " " Then
Counter.Spaces = Counter.Spaces + 1
ElseIf .Text = vbTab Then
Counter.Tabs = Counter.Tabs + 1
ElseIf .Text = vbCr Then
Counter.Returns = Counter.Returns + 1
Else ' else it is a char, the only one where we care about
bold, font, etc.
Counter.Chars = Counter.Chars + 1
If .Case = wdUpperCase Then
Counter.CapitalChars = Counter.CapitalChars + 1
End If

' note for these, we check transition as well as
presence.
If .Font.Bold Then
Counter.BoldChars = Counter.BoldChars + 1
If BoldState = False Then
Counter.BoldTransitions =
Counter.BoldTransitions + 1
End If
BoldState = True
Else
If BoldState = True Then
Counter.BoldTransitions =
Counter.BoldTransitions + 1
End If
BoldState = False
End If

If .Font.Underline Then
Counter.UnderlineChars = Counter.UnderlineChars +
1
If UnderlineState = False Then
Counter.UnderlineTransitions =
Counter.UnderlineTransitions + 1
End If
UnderlineState = True
Else
If UnderlineState = True Then
Counter.UnderlineTransitions =
Counter.UnderlineTransitions + 1
End If
UnderlineState = False
End If

If .Font.Italic Then
Counter.ItalicChars = Counter.ItalicChars + 1
If ItalicState = False Then
Counter.ItalicTransitions =
Counter.ItalicTransitions + 1
End If
ItalicState = True
Else
If ItalicState = True Then
Counter.ItalicTransitions =
Counter.ItalicTransitions + 1
End If
ItalicState = False
End If

If .Font.Name <> LastFontName Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontName = .Font.Name
End If
If .Font.Size <> LastFontSize Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontSize = .Font.Size
End If
If .Font.Color <> LastFontColor Then
Counter.FontTransitions = Counter.FontTransitions
+ 1
LastFontColor = .Font.Color
End If

End If

End With
Next CharCounter


Can anyone help me figure out how to speed this up? I saw something
posted about how using Range.Characters(CharCounter) to index the
chars would be slow, so I tried a For Each [CharObj] in
Range.Characters instead, but it did not seem to be any faster - and
am I getting them in order when I do that?

Ed
 
H

Helmut Weber

Hi Edgar,
in addition to Klaus' advice, you may use
search and replace to get the number of
transitions, too:
Don't forget to reset search options before.
' Dim sTime As Long
' sTime = CLng(Timer)
Dim lItalTrns As Long ' transitions
Dim oRng As Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = ""
.Font.Italic = True
While .Execute
lItalTrns = lItalTrns + 1
Wend
End With
' MsgBox CLng(Timer) - sTime
MsgBox lItalTrns * 2 ' !
Whereas an excerpt from your code on counting
transitions needs approximately 1 second for
1 page of my test-doc, the above macro needs
1 second for 100 pages. You might have to add
some lines of code for docs that start or end with
italic characters, depending on accuracy.
Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word 2002, Windows 2000
 
E

Edgar E. Cayce

Helmut, Klaus, DA,

Thank you so much for your help.

I am wondering - I need to count font transitions, whenever the face,
color or size of the font changes. Since I don;t know what these
fonts are in advance, can you think of a way to use Helmut's Find
method to count these transitions?

Ed
 
K

Klaus Linke

Hi Edgar,

There are several ways to tackle that. I'm not sure what the best/fastest
way is.

You could check out Selection.SelectCurrentColor and .SelectCurrentFont.
While they require working with the Selection (don't work with Ranges) and
probably aren't terribly fast, they may well be sufficiently fast, and easy
to write a macro with.
Those are relics from WordBasic. It would be nice if VBA had similar
commands for Ranges, since I think Word should be able to execute such
commands pretty fast.

Another useful command might be WordBasic.SelectCurrentFormatting. This one
requires that "Keep track of formatting" is turned on.
Unfortunately, VBA has no good interface to the formatting "pseudo styles"
in the "styles and formatting pane (yet?).

Or you might start from the first character, examine its font, size, and
color, then collapse the Selection or Range to the start, and search for
these properties.
You'll get a range with similar formatting, and then can continue doing the
same with the following character.

The way I would probably try it:
-- Create a scratch copy of your document,
-- Search for the formatting of the first character, and replace with
"hidden" text plus tags around that (say "Replace with:
<f=Courier><c=Red><s=12pt>^&</s=12pt></c=Red></f=Courier>").
-- Then look for the next character that is not yet hidden, and repeat.
-- Finally, all the characters will be hidden and tagged, and it should be
easy to count the number of tags (for the total number of font
transitions), or to look for specific transitions using wildcard searches
(say "Find what: \</c=Black\>\</f[!\>]@\<f[!\>]@\</c=Red\>" to look where
the text changes from black to red).

Regards,
Klaus
 
K

Klaus Linke

The wildcard expressions had a few typos:
Find what: \</c=Black\>\</f[!\>]@\>\<f[!\>]@\>\<c=Red\>

Klaus
 
H

Helmut Weber

Hi Edgar,
"If anything is possible, nothing is possible!" Means,
some preliminary knowledge of the actual document structure
is essential. That knowledge has to be exploited.
Do you know, which fonts are in use?
Are there predefined character styles?
Is the variaty of colors in use somehow limited?
Does character size reach from 1 to 500 points? Rather not.
If the doc has been created by humans, then some regularity
is unavoidable. What is this regularity?
You could check first whether a range, preferably a paragraph,
containes different fonts, colors, sizes etc. at all.
Check the color, font, character size of a paragraph.
If the result is "undefined" then there are transitions.
Which, of course, is useful only, if transitions don't occur
in each paragraph anyway (see above).
Let us know, if you need further assistance.
 
W

Word Heretic

G'day Edgar E. Cayce <[email protected]>,

Hokay.

The best way to move along with ranges is NOT to use an iterated
collection. For example, do not use .Paragraphs(99). Every time you
access a member of the para, words, char etc collections, it is
DYNAMICALLY CALCULATED on the spot. This is important for BOTH of the
below solutions.

Find and Replace kicks the pooper out of this if you are able to use
such methodology. In a similar way, you collapse the find's range to
the end, then refind on whats left in the document. Always use the
smallest scope possible for your ranges.


Solution 1

This is the general purpose method that moves a range along

'Declare

Dim Scanner as Range
Dim DocEnd as Long


'Init

Set Scanner=ActiveDocument.Content
Scanner.Collapse wdCollapseStart

'Dont calc this every damn time, do it once

DocEnd=ActiveDocument.Content.End


'Sensible With structures enormously speed code

With Scanner


'Main loop

'We test for the end

While .Start < DocEnd

.MoveEnd


'Do your tests with Scanner

.Collapse wdCollapseEnd

wend
end with


Set Scanner=Nothing


Solution 2

I used a variation on this for my report on typefaces used in a doc.
It deletes as it goes, so iterating the start of the collection is
always very quick. It is usually only suitable for char by char
analyses.

You just always

With .Characters(1)

'blah blah

.Delete

end with

until the character count is 1 in that StoryRange. Corrupt documents
can play havoc here but that's not something I am willing to go into
at length here :) Hopefully it won't be neccesary in a handful of
years.



Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Edgar E. Cayce reckoned:
 
E

Edgar E. Cayce

Steve,

Again, thanks to you and to the other helpful friends here at
microsoft.public.word.vba.general. I am going to give this a try and
let you guys know how it works out.

Can't wait to get your book - just ordered it from the wordheretic
website and I anticipate finding a lot of good stuff in there!

Ed
 
E

Edgar E. Cayce

Hmmm,

I tried method #2, where it constantly uses Characters(1) and deletes
from the beginning of the range each time.

I find that it is not any faster, and whats more, I am getting weird
results - all of the spaces are being seen as carriage returns, and my
count of spaces is now showing the number of sentences, not the number
of spaces... I think I must be doing something wrong here.

I start with a range called CurrentRange, and...

Dim CharCounter As Long
CharCounter = CurrentRange.Characters.Count

Do While CharCounter > 1
With CurrentRange.Characters(1)
If .Text = " " Then
Counter.Spaces = Counter.Spaces + 1
ElseIf .Text = vbTab Then
Counter.Tabs = Counter.Tabs + 1
ElseIf .Text = vbCr Then
Counter.Returns = Counter.Returns + 1
Else
Counter.Chars = Counter.Chars + 1
End If
' tests for italics, fonts, etc omitted.
CharCounter = CharCounter - 1
.Delete
End With
Loop

even skipping the font and formatting tests, this takes forever.

What am I missing?

Ed
 
E

Edgar E. Cayce

Hmmm,

I tried method #2, where it constantly uses Characters(1) and deletes
from the beginning of the range each time.

I find that it is not any faster, and whats more, I am getting weird
results - all of the spaces are being seen as carriage returns, and my
count of spaces is now showing the number of sentences, not the number
of spaces... I think I must be doing something wrong here.

I start with a range called CurrentRange, and...

Dim CharCounter As Long
CharCounter = CurrentRange.Characters.Count

Do While CharCounter > 1
With CurrentRange.Characters(1)
If .Text = " " Then
Counter.Spaces = Counter.Spaces + 1
ElseIf .Text = vbTab Then
Counter.Tabs = Counter.Tabs + 1
ElseIf .Text = vbCr Then
Counter.Returns = Counter.Returns + 1
Else
Counter.Chars = Counter.Chars + 1
End If
' tests for italics, fonts, etc omitted.
CharCounter = CharCounter - 1
.Delete
End With
Loop

even skipping the font and formatting tests, this takes forever.

What am I missing?

Ed
 
E

Edgar E. Cayce

Well, this is very weird.

I have not figured out what is making it so slow, but I have shed some
light on why my counts are coming up wrong. It appears that the
..Delete command, when deleting the last char of a word, deletes the
space after it as well. I figured this out by stepping through my
loop with .Visible set for the Word application. Only my double
spaces between sentences were being counted as spaces, all other
spaces were getting deleted along with the last letter of words.

And then, since 2 chars were being deleted for each word, my count of
chars is off so the paragraph marker at the end of the text gets
counted again and again...

Any idea how to keep it from deleting the spaces when deleting the
last char of a word?

Ed
 
E

Edgar E. Cayce

Well, this is very weird.

I have not figured out what is making it so slow, but I have shed some
light on why my counts are coming up wrong. It appears that the
..Delete command, when deleting the last char of a word, deletes the
space after it as well. I figured this out by stepping through my
loop with .Visible set for the Word application. Only my double
spaces between sentences were being counted as spaces, all other
spaces were getting deleted along with the last letter of words.

And then, since 2 chars were being deleted for each word, my count of
chars is off so the paragraph marker at the end of the text gets
counted again and again...

Any idea how to keep it from deleting the spaces when deleting the
last char of a word?

Ed
 
E

Edgar E. Cayce

Argh, am having problems with my news server - when I post it tells me
it failed, and then it posts twice.

Anyway, please excuse my double posts - Free Agent seems to have some
problems.

I tried my counter using WordHeretic's Method #1, and it works very
quickly - until I enable the code that checks italics, bold, fonts,
etc. Then it slows back down to as fast as my original code, or the
Method #2 code that deletes chars from the beginning of the range.

Note that to get Method #1 to work, I needed to change "DocEnd = .End"
to "DocEnd= .End - 1", or my loop just went forever. This may have to
do with the fact that I am counting each StoryRange in the doc
seperately, and they all seem to have a count of 1 more than is
visible - even empty ones such as footnotes (when I have none) have a
count of one char.

My Bold/Italic/etc. counting code is pretty basic, it's just

if .Text.Bold then
[count it...]
elseif .Text.Italic

Etc.

Any better way to do this?

Ed
 
W

Word Heretic

G'day Edgar E. Cayce <[email protected]>,

only by slow tests using instrrev

Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Edgar E. Cayce reckoned:
 
W

Word Heretic

G'day Edgar E. Cayce <[email protected]>,

It will be as slow as your old method - but it will stay at that speed
all the way through - the other way gets slower AND slower :)

The main problem is you are constantly calculating .Text

Just do it once

Dim Text as String

....

Text=.Text

and use Text in all your If bits

Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Edgar E. Cayce reckoned:
Argh, am having problems with my news server - when I post it tells me
it failed, and then it posts twice.

Anyway, please excuse my double posts - Free Agent seems to have some
problems.

I tried my counter using WordHeretic's Method #1, and it works very
quickly - until I enable the code that checks italics, bold, fonts,
etc. Then it slows back down to as fast as my original code, or the
Method #2 code that deletes chars from the beginning of the range.

Note that to get Method #1 to work, I needed to change "DocEnd = .End"
to "DocEnd= .End - 1", or my loop just went forever. This may have to
do with the fact that I am counting each StoryRange in the doc
seperately, and they all seem to have a count of 1 more than is
visible - even empty ones such as footnotes (when I have none) have a
count of one char.

My Bold/Italic/etc. counting code is pretty basic, it's just

if .Text.Bold then
[count it...]
elseif .Text.Italic

Etc.

Any better way to do this?

Ed


Well, this is very weird.

I have not figured out what is making it so slow, but I have shed some
light on why my counts are coming up wrong. It appears that the
.Delete command, when deleting the last char of a word, deletes the
space after it as well. I figured this out by stepping through my
loop with .Visible set for the Word application. Only my double
spaces between sentences were being counted as spaces, all other
spaces were getting deleted along with the last letter of words.

And then, since 2 chars were being deleted for each word, my count of
chars is off so the paragraph marker at the end of the text gets
counted again and again...

Any idea how to keep it from deleting the spaces when deleting the
last char of a word?

Ed
 
W

Word Heretic

G'day Edgar E. Cayce <[email protected]>,

Post the code Edgar, post the code :)

Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Edgar E. Cayce reckoned:
 
E

Edgar E. Cayce

Steve,

So I can set a String variable to the text and still test for bold,
italic, fonts, etc? I thought that those are only properties of
Ranges, not Strings.

Is it possible to copy the data referred to by a range to a local
variable and keep those properties?

Ed

G'day Edgar E. Cayce <[email protected]>,

It will be as slow as your old method - but it will stay at that speed
all the way through - the other way gets slower AND slower :)

The main problem is you are constantly calculating .Text

Just do it once

Dim Text as String

...

Text=.Text

and use Text in all your If bits

Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Edgar E. Cayce reckoned:
Argh, am having problems with my news server - when I post it tells me
it failed, and then it posts twice.

Anyway, please excuse my double posts - Free Agent seems to have some
problems.

I tried my counter using WordHeretic's Method #1, and it works very
quickly - until I enable the code that checks italics, bold, fonts,
etc. Then it slows back down to as fast as my original code, or the
Method #2 code that deletes chars from the beginning of the range.

Note that to get Method #1 to work, I needed to change "DocEnd = .End"
to "DocEnd= .End - 1", or my loop just went forever. This may have to
do with the fact that I am counting each StoryRange in the doc
seperately, and they all seem to have a count of 1 more than is
visible - even empty ones such as footnotes (when I have none) have a
count of one char.

My Bold/Italic/etc. counting code is pretty basic, it's just

if .Text.Bold then
[count it...]
elseif .Text.Italic

Etc.

Any better way to do this?

Ed


Well, this is very weird.

I have not figured out what is making it so slow, but I have shed some
light on why my counts are coming up wrong. It appears that the
.Delete command, when deleting the last char of a word, deletes the
space after it as well. I figured this out by stepping through my
loop with .Visible set for the Word application. Only my double
spaces between sentences were being counted as spaces, all other
spaces were getting deleted along with the last letter of words.

And then, since 2 chars were being deleted for each word, my count of
chars is off so the paragraph marker at the end of the text gets
counted again and again...

Any idea how to keep it from deleting the spaces when deleting the
last char of a word?

Ed


On Sun, 23 May 2004 13:28:27 +1000, Word Heretic

G'day Edgar E. Cayce <[email protected]>,

Hokay.

The best way to move along with ranges is NOT to use an iterated
collection. For example, do not use .Paragraphs(99). Every time you
access a member of the para, words, char etc collections, it is
DYNAMICALLY CALCULATED on the spot. This is important for BOTH of the
below solutions.

Find and Replace kicks the pooper out of this if you are able to use
such methodology. In a similar way, you collapse the find's range to
the end, then refind on whats left in the document. Always use the
smallest scope possible for your ranges.


Solution 1

This is the general purpose method that moves a range along

'Declare

Dim Scanner as Range
Dim DocEnd as Long


'Init

Set Scanner=ActiveDocument.Content
Scanner.Collapse wdCollapseStart

'Dont calc this every damn time, do it once

DocEnd=ActiveDocument.Content.End


'Sensible With structures enormously speed code

With Scanner


'Main loop

'We test for the end

While .Start < DocEnd

.MoveEnd


'Do your tests with Scanner

.Collapse wdCollapseEnd

wend
end with


Set Scanner=Nothing


Solution 2

I used a variation on this for my report on typefaces used in a doc.
It deletes as it goes, so iterating the start of the collection is
always very quick. It is usually only suitable for char by char
analyses.

You just always

With .Characters(1)

'blah blah

.Delete

end with

until the character count is 1 in that StoryRange. Corrupt documents
can play havoc here but that's not something I am willing to go into
at length here :) Hopefully it won't be neccesary in a handful of
years.



Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Edgar E. Cayce reckoned:

Helmut, Klaus, DA,

Thank you so much for your help.

I am wondering - I need to count font transitions, whenever the face,
color or size of the font changes. Since I don;t know what these
fonts are in advance, can you think of a way to use Helmut's Find
method to count these transitions?

Ed


On Fri, 21 May 2004 02:56:05 -0700, "Helmut Weber"

Hi Edgar,
in addition to Klaus' advice, you may use
search and replace to get the number of
transitions, too:
Don't forget to reset search options before.
' Dim sTime As Long
' sTime = CLng(Timer)
Dim lItalTrns As Long ' transitions
Dim oRng As Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = ""
.Font.Italic = True
While .Execute
lItalTrns = lItalTrns + 1
Wend
End With
' MsgBox CLng(Timer) - sTime
MsgBox lItalTrns * 2 ' !
Whereas an excerpt from your code on counting
transitions needs approximately 1 second for
1 page of my test-doc, the above macro needs
1 second for 100 pages. You might have to add
some lines of code for docs that start or end with
italic characters, depending on accuracy.
Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word 2002, Windows 2000
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top