OCR on copy&paste (NOT scanned) text

S

sangatsu

Can you perform OCR on copy & paste text (NOT scanned text, such as MS
document imaging?
i.e. how can you copy & paste into document imaging from say, a pdf file,
withouth having to print and scan each page?
Thanks
 
G

grammatim

It seems to depend on what you're reading your pdf with. In Reader 9,
I can select text in a pdf, and copy/paste (or even drag'n'drop!) into
Word. (Use Paste Special > No Formatting or you get text boxes and
paragraph markers and such.)
 
G

Graham Mayor

If you are copying and pasting text from a PDF file that was created from
text in the first place and is not protected against copying, then the text
is already editable and does not need OCR software. Paste it into Word.

If what you are pasting is a graphic, paste it into a graphics application
such as Paint and save it as TIF for Microsoft Document Imaging to be able
to open and read it. Otherwise you need software that will convert it such
as PDF2Text or PDF compatible OCR software such as Finereader.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
S

sangatsu

grammatim said:
It seems to depend on what you're reading your pdf with. In Reader 9,
I can select text in a pdf, and copy/paste (or even drag'n'drop!) into
Word. (Use Paste Special > No Formatting or you get text boxes and
paragraph markers and such.)
Grammatim:
You are right, of course. I normally have no problem copying from a PDF,
except I forgot to mention that the text in this PDF has been scanned as a
graphic, so copy/paste does not work in this case.
Sangatsu
 
S

sangatsu

Thanks Graham,
Indeed, what I am posting IS a graphic, so directly pasting it into Word (as
text) won't work; that's why I was wondering if it was possible to import the
graphics into imaging software from the Clipboard and then OCR it as if I had
scanned it. But your suggestion to copy it into Paint and save as TIF files
is good, certainly better than having to print and scan 50+ pages of that PDF
so as to be able to OCR it with MS's imaging software.
By the way, I've tried PDF2Text but it creates text as text boxes, so it's
very difficult to edit that text, almost as much work as simply retyping it.
I don't know if there's a newer version that can scan without the text boxes
(although this would mean no formatting, but that's easier to rectify than
edit those text boxes).
 
G

Graham Mayor

I find it difficult to accept that PDF2Text formats as text boxes when txt
format does not support text boxes?
It does however produce a layout much as you see it in the PDF which is not
much fun when you have columns.
Are you sure you are not thinking of PDF2Word which formats in frames?
Better still use Finereader 9 with which you can exercise much more control
over the output.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
S

sangatsu

Graham,
Seems logical that PDF2Text doesn't support text boxes, so I may have been
thinking of PDF2Word, as the converted text did have text boxes (as a matter
of fact, if I remember correctly, almost every line was in a text box). But
that's a program I used 3-4 years ago and didn't actually use more than a
couple of times, as it required too much work reformatting, although it did
produce copy that was basically identical to the PDF. I didn't see the point
of having an editable copy... that required as much work editing as
re-retyping it. Moral: I'll try FineReader, as you suggested. I also found an
incredible piece of software that seems to do the same as FineReader, and can
even open PDF files to perform OCR. It's free (I know, free stuff is not
normally worth more than it's price, but this one is something else). It's
called FreeOCR.
Sangatsu
 
G

Graham Mayor

I had a look at FreeOCR I suppose it might be OK with simple PDFs but
Finereader it ain't!

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
S

sangatsu

Yes, but it's not the same price either, and it did allow me to OCR a 59 page
PDF (one page at a time, granted, but then, that's still better than printing
and scanning that number of pages).
And by the way, I've had a look at your site too. It's great, and I'll
probably be visiting often.
Sangatsu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top