Hi Graham:
Yes. It's a tool called "Word"
Save the document as "Web Page" (better: as "XML").
Web Page saves as XHTML, which is somewhat easier for humans to read. XML
is easier for machines to read. Either of them will get you the entire
content of the document,
Alternatively, you can save the document as RTF. RTF is very close to
Word's native format. It's huge and very convoluted, but if you open RTF in
a Text editor, you can see exactly what's in there.
Note that a Word document can include a large number of binary objects such
as graphics.
A couple of caveats:
1) If there's anything much wrong with the document, Automation cannot read
it either. Automation depends upon the internal collections in the document
object model being present and intact. If they're not, the object model
collapses.
2) Even if you get the data out, it becomes a huge job to try to analyse
what's wrong. Many of the problems you get in a Word document are due to
excessive levels of abstraction overflowing internal buffers. The code may
be "legal", but it becomes so complex that Word runs out of memory trying to
read it.
XML gives you your best shot: if you get the document out to XML, you can
read and correct most things if you know WordML very well. Regrettably, if
there's much wrong with the document, the XML output filter will fail to
complete the save.
Sorry!
Folks:
Are there any useful tools for analyzing and troubleshooting Word docs?
I'm thinking along the lines of a tool that might read (via Automation) all
the data in the Word document model, and present all the data in some
intelligent fashion -- suitable for troubleshooting oddball problems.
Anything in that neighborhood?
Graham
--
Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.
John McGhie <
[email protected]>
Microsoft MVP, Word and Word for Macintosh. Consultant Technical Writer
Sydney, Australia +61 (0) 4 1209 1410