Formatting OCR'd Documets

N

Nelson Moffat

Word 2003, W2k, OmniPage Pro 14
I am scanning old typewritten meeting minutes, OCR'ing them with
OmniPage, and saving them as Word documents. Because they were typed
50 years ago on different typewriters, the resulting OCR document has
many, many different styles--expanded, condensed, etc. I have tried
Selecting the entire document in Word, then forcing "Normal" style,
but it changes almost nothing. If I remove all formatting I have to
manually reformat everything.
I want to convert the multiple fonts, spacing, kerning etc that
result from the OCR into Times New Roman, and get rid of the 180 or so
"Styles" that appear in the Style box. What's an easy way to do this?

TIA
 
W

Word Heretic

G'day Nelson Moffat <[email protected]>,

Ctrl+A - Select Everything
Ctrl+Q - Reset Para formatting

Then use your formatting Task Pane to select all instances of each
unwanted style, replace it with the desired style. Then delete
unwanted styles.

www.editorium.com offers tools for doing this in batch if required.


Steve Hudson - Word Heretic
Want a hyperlinked index? S/W R&D? See WordHeretic.com

steve from wordheretic.com (Email replies require payment)


Nelson Moffat reckoned:
 
G

Graham Mayor

That's the thing about OCR, it is not intelligent enough to format the
document properly and makes a dog's breakfast of the output, by producing a
document that superficially looks like the original, but which falls apart
when you try to edit the results.

There is no easy way to reformat the document to match the original without
a lot of personal input. Best you can manage would be to remove all
formatting (or OCR as plain text) and then run Autoformat. It still won't
look like the original, but it should give you a good start point.

Unless you actually need the text to be editable (or searchable) it might be
simpler just to scan the documents as graphics.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
N

Nelson Moffat

Thanks a lot. Actually, the whole reason I am doing this is so it IS
searchable. I use OmniPage Pro 14 and I'm not aware of an option to
OCR as plain text, but I'll look further.

Thanks again
 
N

Nelson Moffat

These are meeting minutes and I want a TOC based on the meeting date.
I have done that by defining a style called Meeting Date as a level 1
style, then instructing the TOC dialog to use this to build the TOC.
If I use auto format as you suggest, and the date of the meeting is
set as a smart tag, can I base the TOC on that tag or is it better to
use my user defined style?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top