Import from PDF

B

Bill

Any chance that ACCESS can import from a PDF? I have a website that outputs
the data to a PDF or a PRN file. I don't know anything about how to import
this into access or if it can be done. Sounds like a good challenge though!
Any help would be greatly appreciated. Thx
 
A

a a r o n . k e m p f

nope.

but if you were using SQL Server, you should be able to.

SQL Server supports XML, PDF is XML.

-Aaron
 
J

James A. Fortune

Bill said:
Any chance that ACCESS can import from a PDF? I have a website that outputs
the data to a PDF or a PRN file. I don't know anything about how to import
this into access or if it can be done. Sounds like a good challenge though!
Any help would be greatly appreciated. Thx

Assuming you want to do this entirely in Access rather than using a
program made to do it, text extraction can be very hard or very easy
depending on whether or not encription is used, how much compression is
being done within the PDF file and whether or not the linearized PDF
format is being used. Maybe it's just the challenge you're looking for
:). A text stream may or may not be compressed (usually with flate
compression (Zip), but not always). If so, there will be a flag in the
object stream before the compressed part indicating that the stream is
compressed. Unless, of course, the author of the PDF file has taken the
extra step of compressing the text used for the command stream. It's
rare that storage space is at such a premium that command stream
compression is necessary. Almost any text you would be interested in
would be contained in Page objects. For details see:

http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference.pdf

James A. Fortune
(e-mail address removed)
 
A

a a r o n . k e m p f

good stuff.

Of course, SQL Server can search through PDFs using Full Text Search.

RIGHT?
 
J

James A. Fortune

a said:
good stuff.

Of course, SQL Server can search through PDFs using Full Text Search.

RIGHT?

Well, you could Select All, Copy and Paste from the PDF to get the text
you want, then do a screen capture to get the images and add those to
something like Word as well, but we want to do it in Access --
automatically. To do the things done by VBA in SQL Server you're
probably talking about using .NET. If you'd like to tackle the PDF text
extraction problem using .NET, go ahead. Please post the code here so
that we can compare it with equivalent VBA code.

James A. Fortune
(e-mail address removed)
 
S

Sascha Trowitzsch

There is a tool called pdf2text you can find at
http://www.foolabs.com/xpdf/download.html
Download the Precompiled binaries for Windows there.
You just need the pdf2text.exe from the package. This is a command line utility.
You tell it the location of the pdf file and it will extract the contents to a
text file. If using some special switches the text will being formatted similar
to the pdfs composition.
I use this tool for full text seach in pdfs from within Access. I call it via
ShellExecute and wait then for completion of the process. That can be done with
API WaitForSingleProcess. The easier way would be to wait as long in a loop till
the exported text file exists. After that the text file is read (Open file...)
as a record into a table.

Sascha
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top