Converting PDF To Word

  • Thread starter Rafael Montserrat
  • Start date
R

Rafael Montserrat

Hi,

Is there a means within Word 2004 to bring a pdf document into a word
file? I have select>copy>pasted, but the problem I run into, in
addition to general lack of formatting, is that in the transferred
file, all lines have a carraige return at the end-a paragraph mark.
The paragraphs don't show up in word with an indented first line. How
do I format the document in word with 1st line indented paragraphs?

Thanks, Rafael
 
E

Elliott Roper

Rafael Montserrat said:
Hi,

Is there a means within Word 2004 to bring a pdf document into a word
file? I have select>copy>pasted, but the problem I run into, in
addition to general lack of formatting, is that in the transferred
file, all lines have a carraige return at the end-a paragraph mark.
The paragraphs don't show up in word with an indented first line. How
do I format the document in word with 1st line indented paragraphs?

You are lucky it works that well. Nicking PDFs back into Word is not
meant to work. (Try grabbing one column of a multi-column PDF)

Over time, I have experimented with all sorts of tricks, including
OCR-ing the PDF.

Usually paste unformatted and then re-applying your own style is the
most straightforward. You can usually fix up end of line messes with
find and replace. A few macros have been posted here to do similar
jobs. Try Googling the group with search terms like paragraph and
macro.

You might be luckier than I have been in finding shareware for turning
PDF to text. The trouble with PDF is that there is no rhyme or reason
to the order in which things are placed on the page, so a file that
produces a pretty looking page could have sprayed characters on in any
order at all.
 
M

Michel Bintener

You are lucky it works that well. Nicking PDFs back into Word is not
meant to work. (Try grabbing one column of a multi-column PDF)

Over time, I have experimented with all sorts of tricks, including
OCR-ing the PDF.

Usually paste unformatted and then re-applying your own style is the
most straightforward. You can usually fix up end of line messes with
find and replace. A few macros have been posted here to do similar
jobs. Try Googling the group with search terms like paragraph and
macro.

You might be luckier than I have been in finding shareware for turning
PDF to text. The trouble with PDF is that there is no rhyme or reason
to the order in which things are placed on the page, so a file that
produces a pretty looking page could have sprayed characters on in any
order at all.

One other solution would be to use Adobe Reader, as it offers an option to
save the text inside a PDF file as a separate text file (File>Save as Text).
This might work for simple PDF files, but I've never really used it, so I
can't comment on how it performs with text in multiple columns and so on.
 
C

Clive Huggan

One other solution would be to use Adobe Reader, as it offers an option to
save the text inside a PDF file as a separate text file (File>Save as Text).
This might work for simple PDF files, but I've never really used it, so I
can't comment on how it performs with text in multiple columns and so on.
Hello Rafael,

I hope you are now having success with removing the earlier versions of Word
(other thread)!

Adobe Acrobat Standard/Pro (and I guess Adobe Reader, since Michel says so
and he is never wrong!) does a good job of saving multi-column text in an
accessible way. This is what I do:

In Acrobat, File => Save As => Format => Text (plain) <= [there is no
advantage in saving in RTF or Word, because the styles applied are not
useful]

Open the ".txt" file in TextEdit.

Command-a to select all, then copy.

Open a new blank document in Word, Edit menu => Paste Special => Unformatted
text. The style applied to all this text will be the style of the paragraph
mark where your insertion point is ­ desirably a form of body text or, for
many people, Normal.

Then apply heading etc styles in the usual way.

In the PDF files I have "extracted" in this way, the chaos that accompanies
selection of multiple columns in some PDFs has not occurred, but since I
don't do this often and the graphic designers I use are well-behaved, my
experience may be too narrow.

The carriage returns are eliminated and the text is in the best form for
re-styling in Word.

Now to your second question, "How do I format the document in Word with 1st
line indented paragraphs?"

If you follow the above and paste special as body text or Normal, and your
body text or Normal style has a first line indent, the text will take on
that characteristic.

If you don't yet have that characteristic, all you have to do is change the
definition of the style of body text or Normal (there are several ways, some
discussed recently in this NG, but I use Format menu => Style) and make it
the default.


Cheers,

Clive Huggan
Canberra, Australia
(My time zone is 5-11 hours different from the US and Europe, so my
follow-on responses to those regions can be delayed)
============================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top