Does deleteing text from Word REALLY delete it?

J

Jeff Wiseman

I had the following question asked of me the other day and I had
no answer. Anyone here know what the story on this is?

--------------------------
Do you know if when you delete data from MS Word (without
electronic revisions turned on), does the deleted data continue
to live inside the file even though you can't see it? Same
question, but what if you delete the data and then do a "save as"
under a different file name?. Same question for Powerpoint?
---------------------------

We handle some sensitive information here and files are
frequently stripped of some information in order to provide other
parts of it to an outside company.

Any knowledgable guidance here would be great!
 
M

macropod

Hi Jeff,

If you're using:
.. Track Changes (see Tools|Track Changes|Highlight Changes)
.. 'Allow fast saves' (see Tools|Options|Save) or
.. Versions (see under File| Save As|Tools|Save version)
Any one or more of these can preserve the deleted data.

Data can also be 'hidden' in bookmarks created by SET fields and in text boxes & shapes that have been moved off the visible page.
 
J

Jeff Wiseman

macropod said:
Hi Jeff,

If you're using:
. Track Changes (see Tools|Track Changes|Highlight Changes)
. 'Allow fast saves' (see Tools|Options|Save) or
. Versions (see under File| Save As|Tools|Save version)
Any one or more of these can preserve the deleted data.

Data can also be 'hidden' in bookmarks created by SET fields and in text
boxes & shapes that have been moved off the visible page.


I hadn't thought of the fast saves and versions issue. That the
kind of information that I'm looking for.

I understand about data being hidden out of view. When you have
to take a file off of a classified PC, before the file is
declassified it must be converted to a ordinary ASCII text file
in order to avoid this very issue. Word files cannot be
declassified. The process is so onerous that it is just easier to
print the file, declassify the printout, and then pull it in on
an unclassified machine using an OCR :)

So I guess that my question now is:
- If track changes are off and all previous changes are accepted
or denied.
- And if Fast Saves is disabled
- And if versioning is disabled
- And I then manually delete text in the document that I can see,
is there any other caching that occurs that might retain that
text I just deleted after I've done my Save and quit (and if
there is, will a Save-As avoid this)

The only thing I can think of is if there was a style set up as a
boilerplate or something. Deleting the visible text wouldn't
remove it from the style in the template. But it is extremely
unlikely that the nature of such text would ever be a problem.
 
P

Phillip Jones

unless things have changed either in 2004/2008.
doing as Save as... makes a completely new document.

on the other hand doing just Save ... Just appends the changed
information to the end of the document and the old information is just
marked invisible. a document can go over time say from 20k to 200k ,
depending upon how many revisions you have made.


Hi Jeff,

If you're using:
. Track Changes (see Tools|Track Changes|Highlight Changes)
. 'Allow fast saves' (see Tools|Options|Save) or
. Versions (see under File| Save As|Tools|Save version)
Any one or more of these can preserve the deleted data.

Data can also be 'hidden' in bookmarks created by SET fields and in text boxes & shapes that have been moved off the visible page.

--
------------------------------------------------------------------------
Phillip M. Jones, CET |LIFE MEMBER: VPEA ETA-I, NESDA, ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112 |[email protected], ICQ11269732, AIM pjonescet
------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://www.kimbanet.com/~pjones/90th_Birthday/index.htm>
<http://www.kimbanet.com/~pjones/Fulcher/default.html>
<http://www.kimbanet.com/~pjones/Harris/default.htm>
<http://www.kimbanet.com/~pjones/Jones/default.htm>

<http://www.vpea.org>
 
E

Elliott Roper

Jeff Wiseman said:
I hadn't thought of the fast saves and versions issue. That the
kind of information that I'm looking for.

I understand about data being hidden out of view. When you have
to take a file off of a classified PC, before the file is
declassified it must be converted to a ordinary ASCII text file
in order to avoid this very issue. Word files cannot be
declassified. The process is so onerous that it is just easier to
print the file, declassify the printout, and then pull it in on
an unclassified machine using an OCR :)

So I guess that my question now is:
- If track changes are off and all previous changes are accepted
or denied.
- And if Fast Saves is disabled
- And if versioning is disabled
- And I then manually delete text in the document that I can see,
is there any other caching that occurs that might retain that
text I just deleted after I've done my Save and quit (and if
there is, will a Save-As avoid this)

Yep. There is still all the user and other metadata, and then there is
the crud between the end of file and the end of the last logical disk
block of the file.

The crud could include sensitive stuff that was on the author's
computer at some time in the past with nothing to do with Word or the
doc being transported.

Check the archives, there was a recent flurry of activity on this very
topic.
The only thing I can think of is if there was a style set up as a
boilerplate or something. Deleting the visible text wouldn't
remove it from the style in the template. But it is extremely
unlikely that the nature of such text would ever be a problem.

You can have daft things left behind like the path to the file you have
been working on. The folder and computer name could be quite
significant to some adversary.

I can see why Word files should *never* be declassified.
Print, declassify the paper, then OCR on an unclassified machine looks
like a pretty efficient mechanism to me.

As long as Microsoft keeps the file format a secret there is no way you
can prove there is nothing hidden in a Word document you don't want
sent. Even if there were a proper specification of the format, the
proof would be quite onerous.

The .docx format is a step in the right direction. It is XML in
intention. Of course a single undocumented binary blob inside would
kick the whole declassification proof into touch. (Word's OOXML
standard candidate has a ton of those I believe)

In a less stringent situation, printing to PDF might be acceptable. PDF
is well enough documented to permit a proof in principle.

That only leaves you with the possibility of steganographic images.
How can you be sure there is not a hidden message in the colour table
of a JPG or GIF, or perhaps a message in the least significant bits of
an audio file attachment?

Digital security is a *very* slippery topic.
 
M

macropod

Hi Elliott,

The 'crud' you refer to after the EOF marker isn't included with the file when it is copied to another folder or drive, or attached
to an email. It only exists on the original media. Neither does it remain in the file's metadata, except in the circumstances I
outlined in my previous post.

Turning a document into a PDF is certainly one way of minimising the risk of unwanted data being transmitted with it. The XPS format
is another.

Neither the doc format nor the docx format is secret. MS has published both.
 
M

macropod

Hi Jeff,

You might want to delete any macros (in case they contain anything sensitive), formfields & content controls (which could hold
sensitive material), convert other fields to plain text (this turns linked objects into embedded objects & field calculations to
just their results). If you've got any custom autotext entries in the document (not just its template), you might want to delete
those too.
 
M

macropod

Hi Jeff,

You might want to delete any macros (in case they contain anything sensitive), formfields & content controls (which could hold
sensitive material), convert other fields to plain text (this turns linked objects into embedded objects & field calculations to
just their results). If you've got any custom autotext entries in the document (not just its template), you might want to delete
those too.
 
E

Elliott Roper

macropod said:
Hi Elliott,

The 'crud' you refer to after the EOF marker isn't included with the file
when it is copied to another folder or drive, or attached
to an email. It only exists on the original media. Neither does it remain in
the file's metadata, except in the circumstances I
outlined in my previous post.
Ah, of course you are right. I let my unix ignorance show.
My excuse is I spent too long on VMS, where it probably does get copied
for anything except pure text unless you have "high water marking"
enabled, when the crud between EOF and the end of the allocation block
is explicitly scribbled over before going along for the ride.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top