How can a single document have a single style with different fonts and sizes?

J

John Dalberg

I have tried different pdf to Word converters. I noticed that while the
output Word document shows different fonts and font sizes, when I click
anywhere in the document, the style shown is always the same. I thought
each text which have different fonts or sizes belong to different styles.

How can I change a font size for all text that uses a certain font and
size? When I choose select all instances for a style (in my case it seems
there's only one style), it selects the whole document. I can't selectively
choose certain paragraphs.

A related question would be which pdf to Word converter can output Word
documents which have different styles?

John Dalberg
 
G

Graham Mayor

The whole point of PDF is that it is a graphical representation of the
document that is not intended to be edited. Any converter or OCR software
capable of handling the content and converting it to Word will be hit and
miss and if you are hoping to get an exact facsimile of the original. you
are dreaming. The best plan is usually to extract just the text and rebuild
it from scratch.

For difficult to convert PDFs the best plan is to use a good quality OCR
package such as Finereader - or you could try PDF2Text

Once you have the text loaded into Word, it behaves like any other text and
is amenable to Words extensive formatting capability.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>.
 
J

John Dalberg

Graham Mayor said:
The whole point of PDF is that it is a graphical representation of the
document that is not intended to be edited. Any converter or OCR software
capable of handling the content and converting it to Word will be hit and
miss and if you are hoping to get an exact facsimile of the original. you
are dreaming. The best plan is usually to extract just the text and
rebuild it from scratch.

When you use a pdf editor, you can edit the text, it tells you what font,
font size was used for some text and other attributes. If an editor can do,
why can't a converter create some styles based on these attributes?

This is a book in electronic form. It's a huge manual task to style all the
headers, paragraphs, code snippets...etc.

John Dalberg
 
G

Graham Mayor

Then you should have kept a copy of the document that the PDF was created
from to edit. There is no simple way to edit a PDF file. If you are very
fortunate, Acrobat Pro *may* allow you to save the PDF in Word document
format.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
J

John Dalberg

Graham Mayor said:
Then you should have kept a copy of the document that the PDF was created
from to edit. There is no simple way to edit a PDF file. If you are very
fortunate, Acrobat Pro *may* allow you to save the PDF in Word document
format.

Why do you assumine I created the pdf?
Check out Foxit PDF Editor.

I have Acrobat Pro and it exports to Word. However it's still not smart
enough. It creates tens of styles some of which have no instances in the
document. I can see two paragraphs, one follows the other. Both have the
same font and size, yet they have different styles. I don't understand what
triggers Acrobat to create different styles for what seems to be the same
style in the pdf, yet it creates disparate instances for most of them.


John Dalberg
 
G

Graham Mayor

We are still going round in circles with this, but you are missing the
essential point that PDF is a *graphics* format and if the original document
is not available for reference you are using what is essentially OCR to
recreate a document from the PDF- just as you might with a JPG or TIFF file.
OCR software, even at its best, is not capable of recreating the document
(any document) with 100% accuracy. In my opinion Finereader is the best
choice, but even that will not create the style structure of the original
document and you will have a lot of work on your hands to create an editable
document.

As you apparently didn't create the PDF in the first place, can you obtain
the original document from whoever did - presumably not?

As for Acrobat's own abilities to recreate a PDF, you'll have to take that
up with Adobe. Word is not the issue here.
Had the PDF ben created from a graphical representation of the document (as
some are to make them more difficult to recreate) Acrobat would not be able
to save the PDF as an editable document.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
J

John Dalberg

Graham Mayor said:
We are still going round in circles with this, but you are missing the
essential point that PDF is a *graphics* format and if the original
document is not available for reference you are using what is essentially
OCR to recreate a document from the PDF- just as you might with a JPG or
TIFF file. OCR software, even at its best, is not capable of recreating
the document (any document) with 100% accuracy. In my opinion Finereader
is the best choice, but even that will not create the style structure of
the original document and you will have a lot of work on your hands to
create an editable document.


It might be in a graphics format but a pdf file should contain enough meta
data to export the file with more intelligent styles. Explain to me how an
editor like Foxit Editor is able to open a pdf file, let's you select a
text, tells you what font was used plus other attributes and let's you edit
the text. If an editor is able to do this why can't a converter dump these
attributes and create Word styles out of them? When I say a Word style, all
I want the style to include is font and size so that I can select all
instances of text that's using the same fonts and size. Surely the
converter should be able to have all similar text be lumped into a single
style.

I don't believe an OCR program will produce a better Word document than a
pdf converter.

I am not looking for extracting the same original structure. All I am
looking for is being able to choose all instances of a certain style. I
don't care if the style is a dummy style, which wasn't in the original
document, created by the converter as long as it defines something like a
font style and size and all text having the same font and size point to
that style.

I am not sure if you understand what my goal is. I don't care if the styles
produced by the converter do not resemble the styles of the original
document. If the orignial used Arial size 10 and the converter produces
font Zulu size 11, it's ok as long I can choose all text of that style. It
will take me a few seconds to choose all these text and modify them back to
use Arial size 10. *BUT* the problem is the converters *do not produce
different styles*. Accurate styles from the original (name..etc) doesn't
matter.
As you apparently didn't create the PDF in the first place, can you
obtain the original document from whoever did - presumably not?
No.



As for Acrobat's own abilities to recreate a PDF, you'll have to take
that up with Adobe. Word is not the issue here.
Had the PDF ben created from a graphical representation of the document
(as some are to make them more difficult to recreate) Acrobat would not
be able to save the PDF as an editable document.

It wasn't created from a graphical representation. It's an eBook and the
author must have been using a word processor.

John Dalberg
 
G

Graham Mayor

Clever though Foxit is, I do not think it is reading meta data from the
file. With tests on PDF documents created from my own PC, the fonts reported
are not necessarily the fonts used. You would need to ask Foxit (or better
still Adobe who own the format) what is possible. Until I receive convincing
evidence to the contrary I will stick with my original response.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top