Patrick Schmid <
[email protected]> shared these words
of wisdom:
How about things *printed* to ON?
Printouts are pictures
Unfortunately Yes.
AFAICS this construction was not the best of all possible
solutions. An *Import* feature working on known file-formats
IMO would have been a preferable solution. In the case of PDFs
f.e. an instrument as used in "Abbey PDF-Transformer" (which
produces really fine formatted output to WinWord [AFAICS based
on Abbey's expertise of OCR software]) would have been ways
better than sending text through a printer and then re-cerate
text by OCR. This seems a bit crazy to me.
So yes, they are of course OCRed.
But to which result?
A really bad one! (see below)
Right-click on one of those printouts and select copy all
text. Then paste the text somewhere else and take a look.
You can see for yourself then whether the quality of the OCR
is the issue or WDS.
Thanks for the suggestion!
It reveals how badly OCR is implemented in ON.
ON's OCR is the culprit, not WDS.
I.
1.) Sorry to say so: The OCR produces output hardly usable
for a search.
Unfortunately I cannot make any attachments, so pls permit
longer input here:
a) Result of Copy+Paste in Acrobat:
0-110 Polizei 367 E106 Blessing, Peter, Dr.
0-112 Feuerwehr KÜN-190/156 C206 Bleyel, Bernd
0-19222 Rettungsleitstelle 318/467 D040 Bluthardt, Christian
A 221 A214 Bochert, Ralf, Dr.
367 E106 Ahrens, Uwe, Prof. 230/281/285 A304 Boelke, Klaus,
Dr. 0-579796 A014 AISEC 393 E141 Boese, Jürgen, Dr.
263/264 B026 Akademisches Auslandsamt 280 C040 Böhm, Hugo
375 Y104 Albrecht, Tobias 326 C009 Bossack, Sandra
KÜN-137 A406 Albrecht, Wolfgang, Dr. 202 B007 Böttcher,
Michael 432 F015 Asche, Gerd 449 Y006 Bouché, Daniel
207 A011 Asta (0-251460) 90 A013 Braner, Hannelore
0-506348 A012 Asta HN (Fax) 554 B001b Bräsel, Martina
KÜN-155 C105a Asta KÜN (KÜN-544756) 430 Z005 Bray, Laurent,
Dr. KÜN-53078 C105a Asta KÜN (Fax) KÜN-218 D110.1 Brazel,
Christa KÜN-208 A117 Auerbach, Achim 218 A204 Brecht, Ulrich,
Dr. 288 C035 Aufenthaltsraum KÜN-211 D013.1 Breitenbacher,
Manuel 640 A Aufzug 1-3 KÜN-166/167 C016 Breitkreuz,
Ehrenfried 641 B Behindertenaufzug 260 B023 Brnic, Sonja
644 D Aufzug 321 D110 Brückner, Hans
646 E Aufzug 384 F129 Bucher, Georg, Dr.
645 F Aufzug 221 A214 Buer, Christian, Dr.
403 F222 Auth, Werner, Dr. KÜN-252 D219 Burk, Uwe
<<
Words are separated by blanks. Easy to be indexed and used in
a search.
b) Copy+Paste from ON (input from PDF via ON printer)
0-110Polizei 367E106Blessing, Peter, Dr.
0-112Feuerwehr KÜN-190/156C206Bleyel, Bernd
0-19222Rettungsleitstelle 318/467D040Bluthardt, Christian
A 221A214Bochert, Ralf, Dr.
367E106Ahrens, Uwe, Prof.230/281/285A304Boelke, Klaus, Dr.
0-579796A014AISEC393E141Boese, Jürgen, Dr.
263/264B026Akademisches Auslandsamt280C040Böhm, Hugo
375Y104Albrecht, Tobias 326C009 Bossack, Sandra
KÜN-137A406Albrecht, Wolfgang, Dr.202B007Böttcher, Michael
432F015Asche, Gerd 449Y006Bouché, Daniel
207A011Asta (0-251460)90A013Braner, Hannelore
0-506348A012Asta HN (Fax)554B001bBräsel, Martina
KÜN-155C105aAsta KÜN (KÜN-544756)430Z005Bray, Laurent, Dr.
KÜN-53078C105aAsta KÜN (Fax)KÜN-218D110.1Brazel, Christa
KÜN-208A117Auerbach, Achim218A204Brecht, Ulrich, Dr.
288C035Aufenthaltsraum KÜN-211D013.1Breitenbacher, Manuel
640AAufzug 1-3KÜN-166/167C016Breitkreuz, Ehrenfried
641BBehindertenaufzug260B023Brnic, Sonja
644DAufzug321D110Brückner, Hans
646EAufzug384F129Bucher, Georg, Dr.
645FAufzug221A214Buer, Christian, Dr.
403F222Auth, Werner, Dr. KÜN-252D219Burk, Uwe
<<
Separation of words only if following comma +blank (", ").
2.) I'm sure that you'll agree that a search cannot work at
all with text-materiel like that.
MOST URGENT fix needed.
II.
Words separated by comma+blank are found in the search.
If there are multiple hits on a page the hits are not shown on
the list.
III.
As we are at it:
The search engine implemented in ON could be at least a bit
better. There are no options at all, neither using truncated
search (wildcards), nor a combined search using the Boolean
algebra. I would have expected that at least an "expert mode"
would be
provided and at least something like Acrobat offers would be
available in ON (not talk about askSam's features).
Although I would prefer to have things from PDFs in ON, I
guess that in order to be able to perform intelligent
searches I will have to stick with Acrobat for PDFfed
material and askSam for other material [siiiigh]
Rainald
(who is seriously disappointed)