Need to extract all tables from a word doc (including nested cells

M

Maxer

Ok I have a bit of an odd problem and to make things worse I've only used VBA
in Access, never word.

I have a massive word document that has several hundred tables.

The tables have (in some cases) nested cells. This happened because the
person making the table would hit ENTER if the text for the cell was too long
so it would create a new line instead of letting it word wrap.

What I would like to do is take out all of those tables, and dump them into
Excel.

However, before I can do that I have to clean up the word document.

The tables are 5 columns. For our purposes let us say they are the following:
FirstName, LastName, Phone, Address, email.

The tables are all organized by zip code, in such a way that instead of
having EVERYONE as one giant excel table. This word document has tables
where each table represents 1 zip code.

So the table's first cell is ZIPCODE 12345
then the column headers, then the data.
So an example:

ZIPCODE 12345
FirstName LastName Phone Address (rest omitted)
FName LName 5555 123Street Apt 2A
Fname2 Lname2 3333 575 Orchard Blvd


Now when I copy it to Excel I would like to have it appear as
12345 FName LName 5555 123Street Apt
2A
12345 Fname2 Lname2 3333 575 Orchard
Blvd


So is there some way I can get it to read the first row, store that, then
when it moves the data to Excel always make the first column of excel ZipCode
that was taken from row 1?
Skip row 2 (as that is the column headings, and I don't need those in Excel)
then start pulling the data on row 3?

Another problem I have is that some of those are nested if the text was too
long.
Take address for example:

ZIPCODE 12345
FirstName LastName Phone Address (rest omitted)
FName LName 5555 123456789 LONG
STREET
NAME
Apt 2A
Fname2 Lname2 3333 575 Orchard Blvd


So now what should have been an entire record on row 3 becomes a partial
record with the rest of the data from column 4 (address) being on rows 4 and
5 and the next record starting on row 6.

This creates a rather painful situation for me as I want the data moved
cleanly into Excel.

I would greatly appreciate any help that can be offered, thank you.
 
J

Jean-Guy Marcil

Maxer was telling us:
Maxer nous racontait que :
Ok I have a bit of an odd problem and to make things worse I've only
used VBA in Access, never word.

I have a massive word document that has several hundred tables.

The tables have (in some cases) nested cells. This happened because
the person making the table would hit ENTER if the text for the cell
was too long so it would create a new line instead of letting it word
wrap.

What I would like to do is take out all of those tables, and dump
them into Excel.

However, before I can do that I have to clean up the word document.

The tables are 5 columns. For our purposes let us say they are the
following: FirstName, LastName, Phone, Address, email.

The tables are all organized by zip code, in such a way that instead
of having EVERYONE as one giant excel table. This word document has
tables where each table represents 1 zip code.

So the table's first cell is ZIPCODE 12345
then the column headers, then the data.
So an example:

ZIPCODE 12345
FirstName LastName Phone Address (rest
omitted) FName LName 5555
123Street Apt 2A Fname2 Lname2 3333
575 Orchard Blvd


Now when I copy it to Excel I would like to have it appear as
12345 FName LName 5555
123Street Apt 2A
12345 Fname2 Lname2 3333 575
Orchard Blvd


So is there some way I can get it to read the first row, store that,
then when it moves the data to Excel always make the first column of
excel ZipCode that was taken from row 1?
Skip row 2 (as that is the column headings, and I don't need those in
Excel) then start pulling the data on row 3?

Another problem I have is that some of those are nested if the text
was too long.
Take address for example:

ZIPCODE 12345
FirstName LastName Phone Address (rest
omitted) FName LName 5555

123456789 LONG STREET
NAME

Apt 2A Fname2 Lname2 3333 575
Orchard Blvd


So now what should have been an entire record on row 3 becomes a
partial record with the rest of the data from column 4 (address)
being on rows 4 and 5 and the next record starting on row 6.

This creates a rather painful situation for me as I want the data
moved cleanly into Excel.

I would greatly appreciate any help that can be offered, thank you.

First, when you use ENTER in a cell, the result is not a nested cell, but a
paragraph mark (¶) in the cell. A nested cell would be a single-cell table
inside another cell.

Now, the ¶ are fairly easy to get rid of.
Without VBA, just do a Find/Replace (Find ^p and Replace " ", i.e. a space
without the double quotes)

If you must use VBA for this, let us know.

Once this is done, the cell content should be "clean" and transfer easily to
Excel.

For the transfer itself, yopu need not only to know WOrd VBA, but Excel VBA
as well.

For a start, see:
http://word.mvps.org/faqs/interdev/controlxlfromword.htm

Then, you will need to read that ZIP code cell, store that in a variable,
process the table from the third row to the last by a creating a range,
copying it, going to Excel and pasting that in the first cell of the second
column and finally populate all cells in the corresponding number of rows in
the first column with the ZIP code variable value. Then set a memory marker
for the second cell in the first available row under the stuff you just
pasted.
Repeat and rinse.

This can be a lot of work... try step by step to write the code and comeback
with specific questions. Or, if you are lucky, maybe someone has already
written code that does this and will be along to share it with you.


--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
M

Maxer

Thank you for your reply, I will start reading up on VBA.

However, with cells that contain multiple lines of data I tried searching as
you suggested but did not return any results inside those cells.

It does seem to be a manual break ^l that is seperating the lines in those
cells.

When I go to move the cells over to Excel with VBA should I just tell the
VBA to remove any ^l that it encounters inside the tables before moving them
to Excel?

If I do a find and replace in Word thusly it does correct the issue I'm
having with it creating a new row just for a second line of text inside a
cell.
 
J

Jean-Guy Marcil

Maxer was telling us:
Maxer nous racontait que :
Thank you for your reply, I will start reading up on VBA.

However, with cells that contain multiple lines of data I tried
searching as you suggested but did not return any results inside
those cells.

It does seem to be a manual break ^l that is seperating the lines in
those cells.

When I go to move the cells over to Excel with VBA should I just tell
the VBA to remove any ^l that it encounters inside the tables before
moving them to Excel?

Yes, you could. But replace the ^l by a space (or some other separator
character) or the last word of a line will by stuck with the first word of
the next line.
If I do a find and replace in Word thusly it does correct the issue
I'm having with it creating a new row just for a second line of text
inside a cell.

I am not sure I follow... It seems that you fixed this problem by replacing
all ^l by a space, no?
So, is there a question here?

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top