Reliable Character counts without opening word

R

Reed

I need to get reliable character counts from documents submitted in word
format.

*Reading them from structured storage isn't reliable, as by default fast
saves are enabled and the count isn't up to date as far as I can tell.
*Word on Macs doesn't save using structured storage.

Opening word is to expensive to do to get counts with doc.Characters.Count()
5000 openings of word crushed the system last year, and noone wants to by 10
more servers just to open word. I have thougth of keeping word running and
opening/closing documetns as necessary, but that is still a lot of Processor
time. Aside from that Characters.Count counts Paragragh symbols and other
things that aren't Charaters to the final print layout.

Is there a standard way to get Char counts from word documents composed on
different platforms without opening word everytime? ANd if I have to buy 10
copies of word and 10 servers to get character counts is there a way to get
them reliably without counting markup characters as characters?

TIA for any advice or direction.
~reed
 
C

Cindy M -WordMVP-

Hi Reed,

A possiblity would be to use DSOFile.exe to pick up the document property. No
guarantee how accurate the value it returns will be, however. Word simply has
this "thing" about needing to lay out and repaginate in order to provide "full
fidelity".

And there's no way to change how the internal tool performs the character
count. A paragraph mark (carriage return) is a character, just like any other.

"Fast saves" hasn't been enabled by default for the last couple of versions,
BTW.

Given what you say, I suspect your best bet would be:

1. Install Office 2003 on a dedicated machine. This is the most stable version,
and it supports an XML format.

2. You app should start the appliction ONCE (not start and quit for every
document you process). Open the document, extract the XML and use a transform
on the WordML to pull out only the text (this will drop fieldcodes, paragraph
marks and other things that you don't like). Then you can quickly determine the
Length of the stream the XSLT return.

3. Close the document without saving. Execute a "Do events" to let things on
the system catch up, then process the next document.
I need to get reliable character counts from documents submitted in word
format.

*Reading them from structured storage isn't reliable, as by default fast
saves are enabled and the count isn't up to date as far as I can tell.
*Word on Macs doesn't save using structured storage.

Opening word is to expensive to do to get counts with doc.Characters.Count()
5000 openings of word crushed the system last year, and noone wants to by 10
more servers just to open word. I have thougth of keeping word running and
opening/closing documetns as necessary, but that is still a lot of Processor
time. Aside from that Characters.Count counts Paragragh symbols and other
things that aren't Charaters to the final print layout.

Is there a standard way to get Char counts from word documents composed on
different platforms without opening word everytime? ANd if I have to buy 10
copies of word and 10 servers to get character counts is there a way to get
them reliably without counting markup characters as characters?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top