M
mwebb415
Hi,
I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:
Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
.. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
.. . .
The "Header line" is always the same, and not needed in the Excel file.
I want the worksheet to have one row of data for each section.
Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24
I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.
I was thinking of using an array of delimiters and then cycling through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?
The text file from the PDF appears to be the only option - HTML and XML
both end up representing the details on the page as images. Same for
RTF or DOC files.
Thanks
Matt
I'm trying to take a structured text file (saved from a PDF file) and
read it into an Excel worksheet with a macro. The problem is that the
structure isn't straightforward. Every section in the file contains
~50 rows, and the delimiters aren't consistent. For example:
Section1
Header line
Customer: Acme Rockets Address: 22 Middle Street
State: AZ
Product: Super Rocket Qty: 12
.. . .
Section2
Header line
Customer: Acme Fireworks Address: 66 B Street
State: AB
Product: Coyote Killer Qty: 24
.. . .
The "Header line" is always the same, and not needed in the Excel file.
I want the worksheet to have one row of data for each section.
Customer Address State Product Qty
Acme Rockets 22 Middle Street AZ Super Rocket 12
Acme Fireworks 66 B Street AB Coyote Killer 24
I did look at the often-linked page:
http://www.cpearson.com/excel/imptext.htm
But since my delimiters are not consistent, I was torn on how to
accomplish this. Also, since they aren't necessarily all on newlines,
I'm having trouble coming up with the best way to break them out.
I was thinking of using an array of delimiters and then cycling through
that as I read each line of the file, but using the approach from the
link above, that gets problematic when the field is on a different
line. Anyone have any suggestions?
The text file from the PDF appears to be the only option - HTML and XML
both end up representing the details on the page as images. Same for
RTF or DOC files.
Thanks
Matt