Jay Freedman said:
I can think of at least two ways this might have happened:
- The document was saved with the "Save As Type" setting of "Word 97-2003
&
6.0/95 - RTF (*.doc)".
or
- The document was saved with the "Rich Text Formatting (*.rtf)" setting
and
later renamed with a .doc extension.
Windows Explorer determines its "Type" by looking at the extension.
Anything
named *.doc is considered to be a Microsoft Office Word 97-2003 Document.
Word, once it loads the document, ignores the extension and instead looks
at
the file contents.
- A binary-format Word document starts with the characters (hex) D0 CF 11
E0.
- An RTF document starts with the characters {\rtf1.
If the question makes sense at all, I would say that this is the "real"
type. Once the document is in memory, though, it should be identical
regardless of how it's stored on disk.
I don't think there's any way you can convince Windows Explorer to tell
you when a file is RTF masquerading as a .doc file. If you have some
programming skills, it wouldn't be too hard to write a program -- or even
a Word macro -- to loop through a folder and list the files that have a
.doc extension but whose contents start with the RTF characters.
--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup
so all may benefit.
Thanks Jay. For my next step, I converted one of the 'dual type' files
(save as) to
the Word 2007 format of .docx; the 'duality' is not present in this version.
Then I converted another 'dual' to a (straight) Microsoft Office Word
97-2003 Document (.doc), and the 'duality' is not present. So converting
them to straight .doc or .docxc gets rid of the duality.
Now then -- the reason I am interested in this is as follows: when doing a
search in Vista, a few of my files (known to have a search term within the
contents) would not show in the results for a string within the document; it
turns out that all the exemplars I have of
this were the ones who had the dual .doc/.rtf characteristic. I have now
converted those to straight .doc or .docx files, and behold, search within
their contents now finds the words previously not found. So....I would like
to find any other exemplars of this dual behavior so as to be sure I have
everything searchable; make sense? I can readily see that I have 322
documents with the .doc (not .docx) extension, but unknown of course how
many of those also have the .rtf as well.
Interestingly, I have some straight .rtf Word documents and they search ok
for contents. I also have some other kinds of 'duals' -- some that are
'Microsoft Word 6.0/95' and .doc; these search contents ok.
It just seems to be the ones with that .doc/rtf duality (possibly that is
just an artifact of something else, but it is what I see so far). I took
a brief, random look through the 322 .doc documents, and found a few more
..doc/.rtf duals -- and sure enough none of them would show up in a content
search.
Well, I have traced the problem pretty well, at least I think I know that
those 'dual' type files can
hinder the search, although obviously, I do not know why.
Computers............
Thanks for your input.