M
Martin Brown
HELP!! I have tried the MSKB but sadly found nothing obvious.
The situation is that various groups generate technical documents on a
mixture of Win 2k XP platforms. The problem is that under some as yet
undetermined circumstances when small images or illustations are
included the filesize grows in an unbounded manner.
The sort of thing I mean is a minor report 60kb text with 100kb drawings
was rejected by the email system because the Word DOC file was 40MB.
I have established that most of the hit comes from huge OLE data being
included at some point. And then at some other point along the workflow
an incompatibility sometimes occurs that spontaneously doubles the
filesize again.
That is for every included image or picture and for the OLE data there
is a twin created ending with "_". Here is a small example.
eg Directory of C:\qwerty_image_files
25/01/2006 15:58 <DIR> .
25/01/2006 15:58 <DIR> ..
25/01/2006 15:58 234 filelist.xm_
25/01/2006 15:58 234 filelist.xm~
25/01/2006 15:58 1,512 image001.gif
25/01/2006 15:58 2,084 image003.wmz
25/01/2006 15:58 1,512 image004.gi~
25/01/2006 15:58 3,760,186 oledata.ms_
25/01/2006 15:58 3,760,186 oledata.ms~
7 File(s) 7,525,948 bytes
7.5MB for a short document containing one tiny 1500 byte GIF !!!
These odd Word documents contain more than 99.9% wasted space!
I hope that the magic number "3,760,186" is a give-away about the root
cause of this massive explosion in size. I suspect drag & drop...
Properties reports that the offenders claim to be of normal type
"Microsoft Word 97-2002 Document"
Some reports are now reaching 200MB in size despite the fact that their
true information content is under 500kb.
Exporting the entire document to HTML format, then deleting oledata.mso
and oledata.ms_ and opening what is left produces a new file with normal
size but with the original text formatting somewhat mutilated.
I thought I had a solution with a script that deleted and recreated
every image in a document. But for some recent documents this fix no
longer works and they remain stubbornly obese 40MB files with 200kb of
useful content. Deleting *all* the images at once restores normality.
I would be grateful for any pointers where to look next. I cannot
reproduce this malady on my own machines, and I have yet to witness what
it is the authors do to trigger this problem. They claim that nothing
has changed at their end.
Thanks for any pointers or suggestions on what to look for or try next.
Regards,
Martin Brown
The situation is that various groups generate technical documents on a
mixture of Win 2k XP platforms. The problem is that under some as yet
undetermined circumstances when small images or illustations are
included the filesize grows in an unbounded manner.
The sort of thing I mean is a minor report 60kb text with 100kb drawings
was rejected by the email system because the Word DOC file was 40MB.
I have established that most of the hit comes from huge OLE data being
included at some point. And then at some other point along the workflow
an incompatibility sometimes occurs that spontaneously doubles the
filesize again.
That is for every included image or picture and for the OLE data there
is a twin created ending with "_". Here is a small example.
eg Directory of C:\qwerty_image_files
25/01/2006 15:58 <DIR> .
25/01/2006 15:58 <DIR> ..
25/01/2006 15:58 234 filelist.xm_
25/01/2006 15:58 234 filelist.xm~
25/01/2006 15:58 1,512 image001.gif
25/01/2006 15:58 2,084 image003.wmz
25/01/2006 15:58 1,512 image004.gi~
25/01/2006 15:58 3,760,186 oledata.ms_
25/01/2006 15:58 3,760,186 oledata.ms~
7 File(s) 7,525,948 bytes
7.5MB for a short document containing one tiny 1500 byte GIF !!!
These odd Word documents contain more than 99.9% wasted space!
I hope that the magic number "3,760,186" is a give-away about the root
cause of this massive explosion in size. I suspect drag & drop...
Properties reports that the offenders claim to be of normal type
"Microsoft Word 97-2002 Document"
Some reports are now reaching 200MB in size despite the fact that their
true information content is under 500kb.
Exporting the entire document to HTML format, then deleting oledata.mso
and oledata.ms_ and opening what is left produces a new file with normal
size but with the original text formatting somewhat mutilated.
I thought I had a solution with a script that deleted and recreated
every image in a document. But for some recent documents this fix no
longer works and they remain stubbornly obese 40MB files with 200kb of
useful content. Deleting *all* the images at once restores normality.
I would be grateful for any pointers where to look next. I cannot
reproduce this malady on my own machines, and I have yet to witness what
it is the authors do to trigger this problem. They claim that nothing
has changed at their end.
Thanks for any pointers or suggestions on what to look for or try next.
Regards,
Martin Brown