Spotlight/Finder

I

indago

I am using a PowerMac G4 with OSX 10.4.11 Tiger. I have 1G memory. I did
some experimenting with Spotlight and Finder and found what I believe to be
a problem. I compile documents of news articles in chronological order, one
after another, on a particular subject; one is entitled
FinancialMarkets/Global. Articles relating to this title are compiled in
this particular document. I use Microsoft Word, in Office 2001. When the
documents reach around 95K characters, I begin a new document. If a
document reaches 100K, it doesn't register in the info frame at the bottom
of the window, so I keep them under this figure. I label them
FinancialMarkets/GlobalA, and FinancialMarkets/GlobalB, and etc.

The FinancialMarkets/Global document is at present around 32.8K, with
articles beginning around 1995. I have been following the French trader
Kerviel and his exploits with his enormous losses in the marketplace that
stunned the French banking industry. I had compiled these articles in the
FinancialMarkets/Global document. I attempted to find this document with
Spotlight to add another article to it and Spotlight didn't find it. I put
in the name kerviel. I tried the methods of forced indexing and nothing
worked. I put in Spotlight some earlier names of individuals from this
document and it worked OK. Finally, today, I put in the name kerviel early
in the document, like around 20 characters. Spotlight found the document.
I moved the name further into the document and it still worked OK. I kept
moving the name further into the document until Spotlight didn't find it
anymore. The first article with the name Kerviel was at 9.1K characters.
The last place that Spotlight worked was around 8.1K characters in this
document.

I tried this experiment in another document with another name with the same
results. Spotlight will index a document up to around 8.2K and no further.
This is unacceptable with the structure that I have compiled. What can be
done to correct this glaring deficiency?
 
C

CyberTaz

Certainly a well-tested matter, and I'm certain that your findings will be
both interesting & helpful to some of those who frequent this group, but I
believe you're preaching to the wrong congregation:)

Both Finder & Spotlight are OS X features - neither MS nor any other
software developer determines how those features work. If there appears to
be a limit to how "deep" Spotlight searches into any given file it is
established by Apple. Likewise, if the parameter can be adjusted the setting
would be in the Preferences for Spotlight, itself.

You may get some useful insights from those who frequent the Apple
Discussions, many of whom are quite technically oriented. They may have some
suggestions to offer which go beyond the typical user level. This link is
directly to the OS X 10.4.x Spotlight forum:

http://discussions.apple.com/forum.jspa?forumID=757

Regards |:>)
Bob Jones
[MVP] Office:Mac
 
I

indago

080628 8:56 - CyberTaz posted:
Certainly a well-tested matter, and I'm certain that your findings will be
both interesting & helpful to some of those who frequent this group, but I
believe you're preaching to the wrong congregation:)

Both Finder & Spotlight are OS X features - neither MS nor any other
software developer determines how those features work. If there appears to
be a limit to how "deep" Spotlight searches into any given file it is
established by Apple. Likewise, if the parameter can be adjusted the setting
would be in the Preferences for Spotlight, itself.

You may get some useful insights from those who frequent the Apple
Discussions, many of whom are quite technically oriented. They may have some
suggestions to offer which go beyond the typical user level. This link is
directly to the OS X 10.4.x Spotlight forum:

http://discussions.apple.com/forum.jspa?forumID=757

Regards |:>)
Bob Jones
[MVP] Office:Mac

I've been to the discussion areas on the Apple forums at the Apple Site and
posted this problem but the only responses are more complaints from others
who are having problems with Spotlight, or the usual "fix" of forcing the
indexing.

On Usenet, on a Mac forum for the OSX, I did get this response:

-------------------------------------------
It's not initially clear where the problem really lies. Gathering
information from a document to be indexed for Spotlight is the
responsibility of small chunks of plugin code called importers. So right up
front the first possibility is that the importer for Word documents is only
submitting the first 8k of the content to the Spotlight engine, and in that
case the vendor that wrote the plugin needs to fix it.

On the other hand, it's possible that when indexing content (which is
handled differently from other indexed metadata) the importer *is* giving
Spotlight everything and the engine is giving up after 8k.

From a single, very quick test here it looks like the former; I was just
able to find a 32k AppleWorks document by searching for a made up word that
only occurs at the end. The solution then is to complain to whatever company
provided the plugin (who I presume is Microsoft, but best to check these
things). In a terminal, mdimport -nd1 <filename> will output a message
telling you the path of the importer plugin used; from there, a Get Info
will likely identify the vendor.
---------------------------------------------
 
C

CyberTaz

Perhaps the .doc format wasn't constructed to provide more. However, I did
some quick checking with .docx format produced by Word 2008. Similar to the
test you cited I created a 172KB document comprising 195,126 characters
(249,156 with spaces) with a unique string at the very end. Spotlight had no
problem finding it instantly when that string was used for the search.

Regards |:>)
Bob Jones
[MVP] Office:Mac
 
E

Elliott Roper

CyberTaz said:
Perhaps the .doc format wasn't constructed to provide more. However, I did
some quick checking with .docx format produced by Word 2008. Similar to the
test you cited I created a 172KB document comprising 195,126 characters
(249,156 with spaces) with a unique string at the very end. Spotlight had no
problem finding it instantly when that string was used for the search.

A long time ago I looked into this when Tiger's Spotlight was not
returning hits on Word docs (.doc format) when the first occurrence was
more than 200K characters from the beginning of the file.

I dimly recall that the author of the importer (Apple, Microsoft or
some other entity - I forget) had somewhere stated that was deliberate
design.

My solution then was to print to PDF and use either Preview's or
Spotlight search (which of course was perfectly OK for the full length
of the PDF). Since both were an order of magnitude faster than Word's
own internal search of a single document, that was quite useful while
collaborating on a single large document with several versions, All the
PC toting people would look to me in meetings to find references
several hundred pages into the damn thing.

Matters Spotlight have only got better with Leopard. I have no
first-hand knowledge of the state of the mdi importer for Word 2008 and
..docx files. Nor do I expect gain such knowledge in the near future.

I retrieved the above 'collaboration' document from archive and I can
confirm that the Word 2004 mdi importer I currently have on this
Leopard machine still fails to find matches more than 30-odd pages into
it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top