How to extract raw text from columns

D

Dave Miles

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!
 
D

Dave Miles

It's not a table so the option is not avail to me :(

macropod said:
Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


Dave Miles said:
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!
.
 
D

Doug Robbins - Word MVP

Send me a copy of the document to look at.

--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

Dave Miles said:
It's not a table so the option is not avail to me :(

macropod said:
Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


Dave Miles said:
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!
.
 
M

macropod

So what sort of column arrangement are you using? And how do you keep the items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


Dave Miles said:
It's not a table so the option is not avail to me :(

macropod said:
Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


Dave Miles said:
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!
.
 
D

Doug Robbins - Word MVP

Hi Paul,

Dave sent me one of the documents and I believe that it may have been
produced via OCR.

I am sending him the following response:

You can clean up the document a lot by using Edit>Replace to first replace
^b with nothing to remove all of the Section Breaks, then ^n with nothing to
remove the column breaks, then use Ctrl+A to select everything and use the
Format Paragraph dialog to set the paragraph indents to 0 and the Special
Indent to None. Then use Edit>Replace again to replace ^t with ^p.



A macro could be written to perform all of the above and to further process
the documents (assuming that you have many to do), you could create a list
of the attributes for which you want to extract the values, and then use
this in a macro that iterated through that list and then inserted a tab
after each attribute. If you then used Convert Text to Table, you would
have most of the information in a two column table with the attributes in
the first column and the values in the second column. There would be a few
exceptions such as the addresses and a bit more attention would need to be
paid to the Loan Details section



With a bit of work however, and depending upon how similar the documents are
and what you want as the final result, it should be possible to create some
code that would do a fairly complete job of parsing the data from the
document.


--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

macropod said:
So what sort of column arrangement are you using? And how do you keep the
items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


Dave Miles said:
It's not a table so the option is not avail to me :(

macropod said:
Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]



I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation: Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!
 
D

Dave Miles

Hey Doug & Paul,

I think the docs may be generated by Access. I understand that the source
comes in in Excel and the reports are generated from that. Yes, the simple
answer would be to work from the Excel sheets but they contain more data
than I license so I have to take what I get......sad but true :(



Doug Robbins - Word MVP said:
Hi Paul,

Dave sent me one of the documents and I believe that it may have been
produced via OCR.

I am sending him the following response:

You can clean up the document a lot by using Edit>Replace to first replace
^b with nothing to remove all of the Section Breaks, then ^n with nothing to
remove the column breaks, then use Ctrl+A to select everything and use the
Format Paragraph dialog to set the paragraph indents to 0 and the Special
Indent to None. Then use Edit>Replace again to replace ^t with ^p.



A macro could be written to perform all of the above and to further process
the documents (assuming that you have many to do), you could create a list
of the attributes for which you want to extract the values, and then use
this in a macro that iterated through that list and then inserted a tab
after each attribute. If you then used Convert Text to Table, you would
have most of the information in a two column table with the attributes in
the first column and the values in the second column. There would be a few
exceptions such as the addresses and a bit more attention would need to be
paid to the Loan Details section



With a bit of work however, and depending upon how similar the documents are
and what you want as the final result, it should be possible to create some
code that would do a fairly complete job of parsing the data from the
document.


--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

macropod said:
So what sort of column arrangement are you using? And how do you keep the
items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


Dave Miles said:
It's not a table so the option is not avail to me :(

:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]



I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation: Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!


.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top