How to detect 2007 vs. 2003 documents?

D

darrel

Apologies if this isn't the right newsgroup for this question...if it's not,
please point me to a better forum for this.

The issue:

Our internal users are using Office 2007. They need to publish Word files to
our public site via our CMS. We've told folks to not post docx files but
instead 'back save' to Office 2003 doc files as a lot of the public has not
upgraded to 2007 or has an office suite that can't yet convert 2007 files.

The problem is that some of our users, instead of using the SAVE AS feature
instead renamed their .docx extensions to .doc.

So, we now have docx files littered throughout our site with .doc
extensions.

Is there anyway to tell these files apart from actual 2003 .doc files?
Windows, once you change the file extension, assumes the file extension is
correct. Short of renaming each and everyone to .zip files to see if they
decompress the docx format, is there anything we can do to find the
improperly named files?

-Darrel
 
S

SvenC

Hi darrel,
So, we now have docx files littered throughout our site with .doc
extensions.

Is there anyway to tell these files apart from actual 2003 .doc files?

Mine all start with hex 50 4b 03 04
I guess is it a zip header. So you might want to write an app which
finds all *.doc and check the first 4 bytes to rename those to docx
which show the above pattern.
 
D

David Lightstone

darrel said:
Apologies if this isn't the right newsgroup for this question...if it's
not, please point me to a better forum for this.

The issue:

Our internal users are using Office 2007. They need to publish Word files
to our public site via our CMS. We've told folks to not post docx files
but instead 'back save' to Office 2003 doc files as a lot of the public
has not upgraded to 2007 or has an office suite that can't yet convert
2007 files.

The problem is that some of our users, instead of using the SAVE AS
feature instead renamed their .docx extensions to .doc.

So, we now have docx files littered throughout our site with .doc
extensions.

Is there anyway to tell these files apart from actual 2003 .doc files?
Windows, once you change the file extension, assumes the file extension is
correct. Short of renaming each and everyone to .zip files to see if they
decompress the docx format, is there anything we can do to find the
improperly named files?

-Darrel

Whether there is a way to distinguish or not is relatively unimportant. All
the 2007 format files will have to be converted to 2003 format.
Write a Word 2007 VBA application that opens (one by one) every Word
document in a directory, and then saves it to another directory in Word 2003
format
 
M

macropod

Hi Darrel,

Changing the file extension is a BAD MOVE. Doing so doesn't change the files to the Word '97-2003 format, and may even cause
problems with Word 2007. Word 2007's docx and docm formats are actually zip archives containing compressed xml files, nothing like
the '97-2003 file format.

So, yes, if you've changed the file extensions, you're going to have to test every Word file that's been date-stamped with a date
on/after you installed Word 2007 to see which ones are actually in the new format, then change their extensions back again. To make
this easier, if you write a macro to read the first two characters on any suspect files, the Word 2007 files start with the
characters 'PK', whereas the older format start with the characters 'ÐÏ'. You can then re-save the files in the Word '97-2003
format.

For future reference, Word 2007 can be configured to save files in the Word '97-2003 format by default, without the users having to
resort to File|Save As - see under Word Options|Save (easily accessed via Alt-t, o).

Cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top