Spotlight problem in MS Word

C

catland

I'm using Word 2004 with OSX 10.4. ON TWO DIFFERENT COMPUTERS.

I've indexed my drive twice now on both. On each computer Spotlight
will not find a word inside the same MS Word doc that I know is there.
This is not an old doc. If I copy the contents and paste into Text
Edit, Spotlight finds the word immediately. If I open the doc and
select certain words, Spotlight will find them if they are within the
first 16 pages. After that Spotlight will not find them (the word I was
originally looking for is in the last few pages.)

Is there an issue with Spotlight and Word?
 
D

Daiya Mitchell

This is not a known issue, so far as I know.

Can you check whether the problem is only with this doc, or with any doc
over 16 pages?
 
C

catland

With other MS Word docs it can even be less than 16 pages. I don't know
what is going on.

It doesn't index old Mariner Write docs as well.

No problems with Text Edit
 
D

Daiya Mitchell

Apparently Spotlight will only search the first 100K of raw text. There are
a few mentions on the web of this. I'm guessing that different file formats
put different information into that "first 100K" that Spotlight searches?

Sorry not to be more helpful.
 
E

Elliott Roper

Daiya said:
Apparently Spotlight will only search the first 100K of raw text. There are
a few mentions on the web of this. I'm guessing that different file formats
put different information into that "first 100K" that Spotlight searches?

Sorry not to be more helpful.

In the words of Rowan and Martin -- Very interesting....

I tried a small experiment on a 500 page Word document that had a
unique word near the end.
Spotlight could not find the word, which confirms what you heard Daiya.
I printed the document to PDF
The PDF was in spotlight's list as soon as the output to PDF completed.
Print to PDF was a little slower than Word's search, but not much.

Saving the doc as rtf also let Spotlight index it all the way to the
end.

I guess the race is now on to make a better Spotlight indexing plugin
for Word docs.
 
J

John McGhie [MVP - Word and Word Macintosh]

Yep: That would be the reason: Spotlight can look at only the first 100,000
bytes of any file.

The question is whether or not the words you are looking for occur within
bytes 0 to 99,999 in the file as it is written to disk.

In Microsoft Word files, the text is not necessarily stored in the file in
the order in which it appears in the document (in fact, if the document has
been edited much, you can guarantee that it isn't...)

A Word document internally is a collection of "pieces", some of those pieces
are text. If a document were to have, for example, a graphic in the running
header, plus a large table of styles and numbering formats, it could well be
80 or 90 kB down the file before you even begin to see "text".

As an interesting "test", save a copy of that document as "text only". Word
will write it out to disk with the text in exactly the order that it prints,
and the file will contain nothing BUT text.

Then, chances are Spotlight will find the search string perfectly. Open the
text version of the document and save it as a Word document. Chances are,
Spotlight will still find the string, because there's still nothing much
between the front of the file and the search string.

Now throw a few graphics into the document, attach a template and update its
styles, and move a few sentences around. Make sure one of the sentences you
move is the one containing the text you are searching for (note: I said
"sentences", not "paragraphs", we don't want to make things too easy for
it...). Save the document and close Word.

Chances are that at that point, Spotlight will be unable to find the text,
because now it has been moved within the file so that now it's below
Spotlight's search depth.

In a typical Word document, something like half of the bytes in the file are
not "text", they're formatting, graphics, styles, numbering, languages,
fonts etc. All the usual paraphernalia that distinguishes a Word document
from "text". And makes it difficult for search engines to index...

Hope this helps

In the words of Rowan and Martin -- Very interesting....

I tried a small experiment on a 500 page Word document that had a
unique word near the end.
Spotlight could not find the word, which confirms what you heard Daiya.
I printed the document to PDF
The PDF was in spotlight's list as soon as the output to PDF completed.
Print to PDF was a little slower than Word's search, but not much.

Saving the doc as rtf also let Spotlight index it all the way to the
end.

I guess the race is now on to make a better Spotlight indexing plugin
for Word docs.

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Microsoft MVP, Word and Word for Macintosh. Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 
C

catland

Yes, this is all very well.

All I want to know is Microsoft going to produce a plug in that will
allow Spotlight to find ANY word within a document?
 
C

Clive Huggan

Wouldn't we all, cat!

I take it your question is rhetorical, since the MVPs and other regulars who
associate in this newsgroup are not Microsoft employees, and the Microsoft
employees who watch and sometimes participate aren't allowed to tell you...
;-)

Cheers,
Clive Huggan
============
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top