"Get Info" kb/word count discrepancy

M

Montserrat

w2001
os9.2.2

Hi,

In ³get info² for two documents on the HD, why is one document that has
95,000 words given a size of 1.8MB while another document with 125,000 words
given a size of 832k?

Both have the same font and font size.

Rafael
 
J

John McGhie [MVP - Word and Word Macintosh]

Hi Rafael:

There is no direct arithmetic relationship between the text content and the
file size of a Word document.

The binary file format is an Object-Linking and Embedding "container"
segmented into various data types. The various components are all linked
using binary pointers.

Many of these data types are compressed: the text is one container that is
compressed.

A document formatted entirely with styles will be substantially smaller than
one formatted with direct formatting. A document that has had a lot of
editing will be larger than one that hasn't.

The variation is huge.

Word 2001 is still susceptible to the "Stranded RTF" bug. Bits of document
(often, deleted text) can become unlinked within the file. When that
happens, Word can't clean them up because it can no longer see them. These
things remain in the file. You can remove them by doing a "Maggie" on the
document -- copy all except the last paragraph mark to a new document.

If you try that, you may see a dramatic reduction in the file size. If the
file size doesn't change significantly, chances are the bulk of the document
is normal formatting overhead.

Hope this helps

w2001
os9.2.2

Hi,

In ³get info² for two documents on the HD, why is one document that has
95,000 words given a size of 1.8MB while another document with 125,000 words
given a size of 832k?

Both have the same font and font size.

Rafael

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Microsoft MVP, Word and Word for Macintosh. Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 
K

Klaus Linke

The "Maggie" would also get rid of the forgotten company logo in the header
;-)

Unicode vs. non-Unicode might also be responsible for a factor of two in the
file size.

As John said, there are many possible explanations. Usually, the file size
should be about the number of characters plus 19 kB for the header plus a
bit of overhead for formatting, tables, fields and so on.
If it's really in a completely different magnitude, you could save as HTML
or RTF and look at that in a text editor. That might give you an idea where
the ballast comes from.

The new compressed file formats in future versions of Office should nearly
remove the difference between Unicode/non-Unicode, and reduce the file sizes
generally a lot:
http://www.microsoft.com/presspass/press/2005/jun05/06-01OfficeXMLFormatPR.mspx

Regards,
Klaus
 
M

Matt Centurión [MSFT]

Repeat for each document:
1) Open it
2) Choose "File | Properties"
3) Uncheck "Save preview picture"
4) Choose "File | Save-as"
5) Save to a new name

The sizes should better reflect content (taking into account John's comments
about OLE objects embedded or other graphics)

Matt Centurión
Macintosh Business Unit, Microsoft

--
This posting is provided "AS IS" with no warranties, and confers no rights.

Find out everything about Microsoft Mac Newsgroups at:
http://www.microsoft.com/mac/community/community.aspx?pid=newsgroups
Check out product updates and news & info at:
http://www.microsoft.com/mac
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top