The Ever-Expanding Word Document

P

Plumer

A very warm South Pacific greeting to you all: Kei te mihi! Kei te mihinui tatou

Back in April I posted an item which enquired about

“..material that Word documents carry around with them that most people don't know about -- personal information and revisions and maybe other stuff. I know about the personal information but the other material referred to was of great interest since document discovery (in a legal sense) is germane to my business and I am keen to understand the inner workings so that I can know what I'm giving to someone electronically..â€

That issue is, I suspect, related to another difficulty that I have which I will refer to as the ever-expanding-document (EXD). In the course of my business I must produce each month a set of papers for the board meeting. The papers are a standard Word doc. To produce the papers for month N, I copy the papers for month N-1, remove the stuff that is not relevant and replace it with material that is. This has been going on for a couple of years now and effectively what has happened is that the original document has left an always-changing trail of monthly copies of itself on my hard drive. Allowing for some up and downs of content, the document has continually expanded through this process. It has now arrived at a point where the reported size of the doc is significantly larger than what I would expect given the content of the doc

1. There are no graphics of any kind in the do

2. There is a Page 1 header which is all text. Page 2 to N do not have headers but they do have footers

3. There are a few tables (typically about 6) which have a max of about 10 rows (in a really extreme case, may be 20

4. I use outline numbering

5. The most recent version of the doc is 5 pages long, has a character count of 5013, 279 paragraphs

6. Despite this seemingly harmless background, the file is over 310KB in size

Even allowing for the Word overhead this still seems excessive. The text of what you are reading was written in Word and at this point I saved the doc, noted the word count and the doc size – the results are

1. Word count: 158

2. Document size: 21K

What gives?? There seems to be a lot of “junk DNA†in my EXD (ever-expanding-document) and I’d like to know how to get rid of it. What is causing it? Where is this crap being collected? What is it and how do I remove it from the document

This is not just an intellectual exercise. These documents have to be sent via email to a large number of people who are – to all intents and purposes – computer illiterate and the size of the document causes a lot of problems. Use your imagination..

Any suggestions on how the “junk DNA†can be removed would be very gratefully received. So you know the background, I’m a very competent and informed (though obviously incompletely…) user of Word. I’m very comfortable with Word VB and the manipulation of docs using that mechanism

Anyway, please help me improve my knowledge

Kia ora tatou
Plume
 
D

Doug Robbins - Word MVP

Try saving the document as an .RTF file, close it, then open the .RTF file
and resave it as a Word document. Let us know if that reduces the size.

--
Please post any further questions or followup to the newsgroups for the
benefit of others who may be interested. Unsolicited questions forwarded
directly to me will only be answered on a paid consulting basis.

Hope this helps
Doug Robbins - Word MVP
Plumer said:
A very warm South Pacific greeting to you all: Kei te mihi! Kei te mihinui tatou!

Back in April I posted an item which enquired about

“..material that Word documents carry around with them that most people
don't know about -- personal information and revisions and maybe other
stuff. I know about the personal information but the other material referred
to was of great interest since document discovery (in a legal sense) is
germane to my business and I am keen to understand the inner workings so
that I can know what I'm giving to someone electronically..â€
That issue is, I suspect, related to another difficulty that I have which
I will refer to as the ever-expanding-document (EXD). In the course of my
business I must produce each month a set of papers for the board meeting.
The papers are a standard Word doc. To produce the papers for month N, I
copy the papers for month N-1, remove the stuff that is not relevant and
replace it with material that is. This has been going on for a couple of
years now and effectively what has happened is that the original document
has left an always-changing trail of monthly copies of itself on my hard
drive. Allowing for some up and downs of content, the document has
continually expanded through this process. It has now arrived at a point
where the reported size of the doc is significantly larger than what I would
expect given the content of the doc.
1. There are no graphics of any kind in the doc

2. There is a Page 1 header which is all text. Page 2 to N do not have
headers but they do have footers.
3. There are a few tables (typically about 6) which have a max of about
10 rows (in a really extreme case, may be 20)
4. I use outline numbering.

5. The most recent version of the doc is 5 pages long, has a character count of 5013, 279 paragraphs.

6. Despite this seemingly harmless background, the file is over 310KB in size.

Even allowing for the Word overhead this still seems excessive. The text
of what you are reading was written in Word and at this point I saved the
doc, noted the word count and the doc size – the results are:
1. Word count: 1581

2. Document size: 21KB

What gives?? There seems to be a lot of “junk DNA†in my EXD
(ever-expanding-document) and I’d like to know how to get rid of it. What is
causing it? Where is this crap being collected? What is it and how do I
remove it from the document?
This is not just an intellectual exercise. These documents have to be sent
via email to a large number of people who are – to all intents and
purposes – computer illiterate and the size of the document causes a lot of
problems. Use your imagination...
Any suggestions on how the “junk DNA†can be removed would be very
gratefully received. So you know the background, I’m a very competent and
informed (though obviously incompletely…) user of Word. I’m very comfortable
with Word VB and the manipulation of docs using that mechanism.
 
C

Carl.

Plumer said:
3. There are a few tables (typically about 6) which have a max of about 10 rows
(in a really extreme case, may be 20)

If I make a table that is 60x15 (same number of cells as 6 tables of 15x10)
with no data in the cells in an otherwise blank document, I get a file size
of 115k. You might still have mystery text in the background, but tables
can make a file much larger than expected. It just keeps going up. A
60x300 gives me about 1.5 megabytes.

If you don't get any other answer, you could try re-creating one of these
manually on a fresh document just to see how big it gets. Paste the text to
a text file (to make sure it doesn't carry any junk with it) and then
cut/paste from there to the new document, add tables and various effects
manually, and see what you get.
 
P

Plumer

Kia Ora Doug me Carl e hoa ma

This is what I've done so far

1. Save Doc As RTF: 381KB doc becomes 531 KB as an RTF file (that’s another puzzle but I’ll set that aside for the moment
2. Save RTF back as Doc: the size drops from 381 to 256
3. Replacing tables with tabs where that is possible (if no altogether desirable), size drops to 22
4. Replacing all fields with their values (Ctrl+Shift+F9), size drops to 19
5. The reconstructed doc now contains only 2853 characters
6. Clearly the points that have been made on this thread are well made and valid and I don’t want to flog a dead horse but I’m still puzzled why 550 words comprising 2800 chars on 4 pages takes 199K
7. When I have a little time (later today) I will take the other approach of starting completely from scratch and report the result
8. BTW back in the early days of sending these docs to board members who did not have consistent software (some have Word, some use WordPerfect, different versions etc etc etc) I thought to use RTF but that very quickly became unworkable simply because the size problem was exacerbated by conversion from doc to rtf. My intuition (which expected the result to be smaller) got it quite wrong

Anyway, many thanks for your thoughts and for your willingness to share them

Kia ora tatou
Plume
 
A

AA

These documents have to be sent via email to a large number of
people who are to all intents and purposes computer illiterate and
the size of the document causes a lot of problems.

Plumer,

I bought one of the inexpensive PDF filemakers (around $45) last year
for a different reason but I now use it for emailing virtually all of
my Word or Excel documents, unless it is to someone who needs to
manipulate the file. You always know how it's going to look at the
other end, regardless of their printer driver, they can't alter the
file, and the files are usually less than half the size. I never did
any experimenting, but my guess is that the tables that increase Word
file size wouldn't make that much difference in a PDF.

Kia ora tatou! (Don't know what that means but it sounds cool)

Andy
 
P

Plumer

Kei te mihi ki a koe, AA! (Greetings to you, AA

Thanks for the suggestion -- it's a good one and if I were starting over I'd really give it some serious thought but if I change the methodology now people will be after my blood. I guess we all get entrenched in things and don't like it when they change especially if it means having to learn something new -- even if it isn't that challenging

Kia ora tatou! (Many thanks to you all!

Plumer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top