Determining Revision Date in the XML

D

dwotx

I am trying to read some Worrd 2003 documents (using Word 2007 automation)
from the web and determine their last modification dates. I'm wondering if
anyone has a simpler way than I've been able to figure out.

First, I have to download the file in question to a temporary file as I've
been unable to figure out how to get Word automation to read an incoming
document from a stream. It's open method seems to accept a path name only
(not a URL). The process of saving the document to a temporary file means
there are creation dates/modification dates *associated* with the file, but
the internal info in the WordML is of course unchanged.

Next, I create an instance of ApplicationClass and use that to open a new
document giving it the pathname to the temp file. Presumably, Word does not
substitute a creation/modification date taken from the file system during
this process, but I actually don't know.

Next, I want to use an xsl transform to search the WordML and pick out the
revision date, creation date, etc. The transform is easy to write, but
unfortunately, I maintain it as an embedded resource in my VS Project. That
in turn means that I reference it using the Assembly.GetExecutingAssembly
method followed by the GetManifestResourceMethod to get the file I want.
Sadly, that to provides a stream and Word's TransformDocument method again
only accepts a path .. you get the picture.

In any case, has someone found a better way to get the information I'm
seeking - I'd much appreciate it. Second, does anyone know for sure that
Word is not using modification dates from the file system representing the
temp file? Third, since I've not gotten to the part of running the transform
yet, I am presuming that it will run against an OpenXML view of the document
since I'm using 2007 classes (all transformation from 2003's WordML has
already been done)?

Any insights appreciated.

--Don
 
P

Peter Jamieson

if you're starting from documents you know are in Word 2003
WordProcessingML format, why not use an XSLT to grab the information
directly from the document? i.e., you don't need to open the document in
Word at all - it's just an XML document. Unlike Office 2007 OOXML files,
these are /real/ XML documents, not XML documents in ZIP wrapper.

If they aren't already in WordProcessingML format, then you may be
labouring under a misunderstanding because documents are not stored
internally in XML format and would only "become" XML documents when you
actually saved them.

Peter Jamieson

http://tips.pjmsn.me.uk
 
D

dwotx

Peter,
Thanks. Unfortunately, my source is not in xml format. I was trying to
get to xml without having Word modify the key dates during a save. (I've
since learned that Word may modify the dates during the open, making the
problem worse.)
Thanks,
--Don
 
P

Peter Jamieson

Yes, I don't know what info. you can discover once Word has opened the
document.

If someone forced me to write something like this I'd probably try and
get to grips with the binary .doc format, which is now published (with
conditions). But it looks like a heck of a slog to get to a point where
you could actually get any info. out of it.

Peter Jamieson

http://tips.pjmsn.me.uk
 
T

Tony Jollans

Presumably, Word does not
substitute a creation/modification date taken from the file system during
this process, but I actually don't know.

I think you'll find that that is all that it does and that there is no
reliable date anywhere inside the file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top