Yep: That would be the reason: Spotlight can look at only the first 100,000
bytes of any file.
The question is whether or not the words you are looking for occur within
bytes 0 to 99,999 in the file as it is written to disk.
In Microsoft Word files, the text is not necessarily stored in the file in
the order in which it appears in the document (in fact, if the document has
been edited much, you can guarantee that it isn't...)
A Word document internally is a collection of "pieces", some of those pieces
are text. If a document were to have, for example, a graphic in the running
header, plus a large table of styles and numbering formats, it could well be
80 or 90 kB down the file before you even begin to see "text".
As an interesting "test", save a copy of that document as "text only". Word
will write it out to disk with the text in exactly the order that it prints,
and the file will contain nothing BUT text.
Then, chances are Spotlight will find the search string perfectly. Open the
text version of the document and save it as a Word document. Chances are,
Spotlight will still find the string, because there's still nothing much
between the front of the file and the search string.
Now throw a few graphics into the document, attach a template and update its
styles, and move a few sentences around. Make sure one of the sentences you
move is the one containing the text you are searching for (note: I said
"sentences", not "paragraphs", we don't want to make things too easy for
it...). Save the document and close Word.
Chances are that at that point, Spotlight will be unable to find the text,
because now it has been moved within the file so that now it's below
Spotlight's search depth.
In a typical Word document, something like half of the bytes in the file are
not "text", they're formatting, graphics, styles, numbering, languages,
fonts etc. All the usual paraphernalia that distinguishes a Word document
from "text". And makes it difficult for search engines to index...
Hope this helps
In the words of Rowan and Martin -- Very interesting....
I tried a small experiment on a 500 page Word document that had a
unique word near the end.
Spotlight could not find the word, which confirms what you heard Daiya.
I printed the document to PDF
The PDF was in spotlight's list as soon as the output to PDF completed.
Print to PDF was a little slower than Word's search, but not much.
Saving the doc as rtf also let Spotlight index it all the way to the
end.
I guess the race is now on to make a better Spotlight indexing plugin
for Word docs.
--
Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.
John McGhie <
[email protected]>
Microsoft MVP, Word and Word for Macintosh. Consultant Technical Writer
Sydney, Australia +61 4 1209 1410