Metadata on Office documents

K

khai

Hi,

I am programmatically trying to check if a file has been modified by doing a
check on the metadata of the "before" and "after" files. I am using the md5
hash to do this, and it works with pdf and txt files. However, in comparing
Office documents like Word and PowerPoint, the hashing returns different
results even though the "after" file has not been modified. I did a check on
the metadata and found that the dates (last modified, created etc) are
different for the "before" and "after" file, and I believe that this is
reason for the different hashing results.

Before the actual hashing is done, the "after" file will be saved at a temp
location on the server. Is this the probable reason why the dates in the
metadata differ? If so, is there a work around to this?

I hope this is the right place to post my question on this issue. =)
 
M

macropod

Hi khai,

If a Word file is saved using File|Save as, the creation date necessarily changes - even if the name and contents don't.

If someone starts editing a Word file, undoes those edits and then re-saves the now 'unedited' file, the creation date and content
remains the same, but the last-modified date changes.

Note too that, while a file is open, the last-modified date changes, even though no modifications might have been made. If the file
is closed without saving, the last-modified date reverts to its former value. Obviously, though, the last-accessed date would
change.
 
K

khai

Hmm so a metadata checking on office documents is not viable for my
requirements.

I have tried to do a byte checking (i.e. File.ReadByte()), but takes too
long.

Is there another way to check for file modifications?

macropod said:
Hi khai,

If a Word file is saved using File|Save as, the creation date necessarily changes - even if the name and contents don't.

If someone starts editing a Word file, undoes those edits and then re-saves the now 'unedited' file, the creation date and content
remains the same, but the last-modified date changes.

Note too that, while a file is open, the last-modified date changes, even though no modifications might have been made. If the file
is closed without saving, the last-modified date reverts to its former value. Obviously, though, the last-accessed date would
change.

--
Cheers
macropod
[MVP - Microsoft Word]


khai said:
Hi,

I am programmatically trying to check if a file has been modified by doing a
check on the metadata of the "before" and "after" files. I am using the md5
hash to do this, and it works with pdf and txt files. However, in comparing
Office documents like Word and PowerPoint, the hashing returns different
results even though the "after" file has not been modified. I did a check on
the metadata and found that the dates (last modified, created etc) are
different for the "before" and "after" file, and I believe that this is
reason for the different hashing results.

Before the actual hashing is done, the "after" file will be saved at a temp
location on the server. Is this the probable reason why the dates in the
metadata differ? If so, is there a work around to this?

I hope this is the right place to post my question on this issue. =)
 
M

macropod

Hi kai,

Given a changed MD5 value:
.. If the file-size differs, there's obviously been a change in content.
.. If the file-size is unchanged, you'll need to test the file's contents to establish whether there's been a change in content. With
Word, that may mean nothing more than that a Table-of-Contents field has been updated (eg as a result of printing the document) but
no actual change in content occurred.

--
Cheers
macropod
[MVP - Microsoft Word]


khai said:
Hmm so a metadata checking on office documents is not viable for my
requirements.

I have tried to do a byte checking (i.e. File.ReadByte()), but takes too
long.

Is there another way to check for file modifications?

macropod said:
Hi khai,

If a Word file is saved using File|Save as, the creation date necessarily changes - even if the name and contents don't.

If someone starts editing a Word file, undoes those edits and then re-saves the now 'unedited' file, the creation date and
content
remains the same, but the last-modified date changes.

Note too that, while a file is open, the last-modified date changes, even though no modifications might have been made. If the
file
is closed without saving, the last-modified date reverts to its former value. Obviously, though, the last-accessed date would
change.

--
Cheers
macropod
[MVP - Microsoft Word]


khai said:
Hi,

I am programmatically trying to check if a file has been modified by doing a
check on the metadata of the "before" and "after" files. I am using the md5
hash to do this, and it works with pdf and txt files. However, in comparing
Office documents like Word and PowerPoint, the hashing returns different
results even though the "after" file has not been modified. I did a check on
the metadata and found that the dates (last modified, created etc) are
different for the "before" and "after" file, and I believe that this is
reason for the different hashing results.

Before the actual hashing is done, the "after" file will be saved at a temp
location on the server. Is this the probable reason why the dates in the
metadata differ? If so, is there a work around to this?

I hope this is the right place to post my question on this issue. =)
 
K

khai

Hi macropod,

Thanks for the suggestion. I have actually tried a byte comparison method
i.e. albiet the File.ReadAllBytes(). So far, it seems to work just fine. I
hope it remains so.

Thanks. =)

macropod said:
Hi kai,

Given a changed MD5 value:
.. If the file-size differs, there's obviously been a change in content.
.. If the file-size is unchanged, you'll need to test the file's contents to establish whether there's been a change in content. With
Word, that may mean nothing more than that a Table-of-Contents field has been updated (eg as a result of printing the document) but
no actual change in content occurred.

--
Cheers
macropod
[MVP - Microsoft Word]


khai said:
Hmm so a metadata checking on office documents is not viable for my
requirements.

I have tried to do a byte checking (i.e. File.ReadByte()), but takes too
long.

Is there another way to check for file modifications?

macropod said:
Hi khai,

If a Word file is saved using File|Save as, the creation date necessarily changes - even if the name and contents don't.

If someone starts editing a Word file, undoes those edits and then re-saves the now 'unedited' file, the creation date and
content
remains the same, but the last-modified date changes.

Note too that, while a file is open, the last-modified date changes, even though no modifications might have been made. If the
file
is closed without saving, the last-modified date reverts to its former value. Obviously, though, the last-accessed date would
change.

--
Cheers
macropod
[MVP - Microsoft Word]


Hi,

I am programmatically trying to check if a file has been modified by doing a
check on the metadata of the "before" and "after" files. I am using the md5
hash to do this, and it works with pdf and txt files. However, in comparing
Office documents like Word and PowerPoint, the hashing returns different
results even though the "after" file has not been modified. I did a check on
the metadata and found that the dates (last modified, created etc) are
different for the "before" and "after" file, and I believe that this is
reason for the different hashing results.

Before the actual hashing is done, the "after" file will be saved at a temp
location on the server. Is this the probable reason why the dates in the
metadata differ? If so, is there a work around to this?

I hope this is the right place to post my question on this issue. =)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top