Scanning OCR to a Word .doc

B

Bill Bunton

I use an iMac w/OS-X and MS Word X v 1.2

Using a cheap all-in-one HP printer/copier/scanner w/OCR, I
recently and successfully scanned my first text document as
instructed to a .jpeg, then converted it to a Word.doc and
retained most of the original formatting. As an amateur in
this important area to me, needless to say I was quite
proud of myself . until I found out while trying to do just
simple editing, I windup with black boxes around the
paragraphs I'm working on, and can't seem to get rid of
them; and even trying to "highlight," some paragraphs are
my normal green but some are blackened and unworkable ...

Q: Is this dilemma correctable? I sure would appreciate any
help in this frustrating matter. Bill
 
E

Elliott Roper

Bill Bunton said:
I use an iMac w/OS-X and MS Word X v 1.2

Using a cheap all-in-one HP printer/copier/scanner w/OCR, I
recently and successfully scanned my first text document as
instructed to a .jpeg, then converted it to a Word.doc and
retained most of the original formatting. As an amateur in
this important area to me, needless to say I was quite
proud of myself . until I found out while trying to do just
simple editing, I windup with black boxes around the
paragraphs I'm working on, and can't seem to get rid of
them; and even trying to "highlight," some paragraphs are
my normal green but some are blackened and unworkable ...

Q: Is this dilemma correctable? I sure would appreciate any
help in this frustrating matter. Bill

Which OCR package would that be?

Trying to get the freebie OCRs to preserve formatting is more hassle
than it is worth. As you have now observed, what looks like preserved
formatting is a bunch of cheating. Try hitting a plain text, or
paragraph mode in the OCR package, then apply styles back in while you
are fixing all the misrecognised text once you are back in Word.
 
D

Daniel Cohen

Elliott Roper said:
Which OCR package would that be?

Trying to get the freebie OCRs to preserve formatting is more hassle
than it is worth. As you have now observed, what looks like preserved
formatting is a bunch of cheating. Try hitting a plain text, or
paragraph mode in the OCR package, then apply styles back in while you
are fixing all the misrecognised text once you are back in Word.

I was wondering if the program is actually treating part of the page as
an image with text in it (or a text box) rather than straight text.
 
E

Elliott Roper

Daniel Cohen said:
I was wondering if the program is actually treating part of the page as
an image with text in it (or a text box) rather than straight text.

When I was using OmniPage not-so-Pro, it would happily encase snippets
of text in their own little boxes in order to get the thing looking
right, but utterly uneditable. Fortunately it had a setting that did a
good job of recognising without frills, and I never bothered with its
page layout setting ever again.

Come to think of it, it has been a long time since I felt the urge to
OCR anything. So much is on-line and far easier to plagiarise.

(cue Tom Lehrer song)
....
 
G

Gene van Troyer

Come to think of it, it has been a long time since I felt the urge to
OCR anything. So much is on-line and far easier to plagiarise.

Tut-tut, Elliot. Tom Lehrer aside, of course. :)

Gene van Troyer
 
G

Gene van Troyer

I use an iMac w/OS-X and MS Word X v 1.2

Using a cheap all-in-one HP printer/copier/scanner w/OCR, I
recently and successfully scanned my first text document as
instructed to a .jpeg, then converted it to a Word.doc and
retained most of the original formatting. As an amateur in
this important area to me, needless to say I was quite
proud of myself . until I found out while trying to do just
simple editing, I windup with black boxes around the
paragraphs I'm working on, and can't seem to get rid of
them; and even trying to "highlight," some paragraphs are
my normal green but some are blackened and unworkable ...

Q: Is this dilemma correctable? I sure would appreciate any
help in this frustrating matter. Bill

I've had excellent results with Abbyy FineReader Pro. ReadIRIS, hmmm, I
don't know. Even the ReadIRIS Pro update isn't as intuitively easy to use as
FineReader.

Remember that OCR software is not graphics or DTP software. Some of the
packages munge graphic and DTP elements even if they manage execellent text
recognition.

Gene van Troyer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top