Mark Pavlick said:
List members:
I'd appreciate any suggestions regarding formatting scanned
documents? I need to incorporate scanned documents with various formats
into a book with a unified format. How to impose an overall order on
these documents? Thanks in advance for any help. - Mark Pavlick
A few additional suggestions:
Examine your workflow. Choose your OCR tools carefully. Paste
unformatted.
In more detail:
Since scanning old documents is very time consuming, make sure you do
it once and properly. I'd advise retaining the images and OCR-ing those
rather than scanning direct to text. That way you can avoid re-handling
the originals as you repair the OCR mistakes.
Test your OCR techniques and scanning settings together. e.g. Here I
use ReadIris OCR and a Canon Lide30 scanner. 300 dpi greyscale with
auto levels saving to jpg works well. I train ReadIris on each
document's fonts and layout when there are lots of similar documents to
be OCR'd, and store the settings for later use. I set ReadIris to
ignore the layout and font so it does nothing more than read the text
and detect really truly paragraph endings.
My next step would work well in your workflow. I tell ReadIris to OCR
to the clipboard, then I cmd-tab to Word and paste unformatted. That
means I get the text in the uniform style using similar techniques to
those little_creature suggested. Then, with the jpg on one screen and
the Word doc on the other, I fix the OCR.
While doing that, If I'm being efficient, I'm scanning more input with
GraphicConverter to add to my pile of JPGs.
If I'm not. I take lots of coffee breaks and chat to people on
newsgroups. It is such a soul-destroying task.