J
Jeff
Need help extracting data from Tables in word documents to access
database
I have thousands of word document invoices that I need to extract
about 25 pieces of data anD create an access database.
The Docs are the same format as shown below and the flags will be
the same as the field name in the DB and each Doc will be a record.
The document has 7 lines of standard text as a title,
then Three tables seperated by line.
Table 1 - 1row x 2cols - no col titles
Need to extract Order Number and Order Date
Note the flag and data are in the same cell for these two
(it would be cool to Split Order Date
into Day, Date and Time fields)
Table 2 - 7-12 Rows x 7 Col - Row could have 1-5products -
Cols first and last are blank
Min Need is the Total: (Bottom of col 6)
(it would be cool to create Five product fields and record
up to the first 5 Item numbers ordered - Col 3)
Table 3 - 15-20 rows x 2 cols - First row is the flags for the
fields and the second row in the data (text)
Missing flags are blank (ie add2 not listed)
All the data flags are predefined,
unque and always the same in the docs
I'm new to VBA, so I don't know if this is easy to do,
seaching the groups I see that people have done similar things,
just not exactly my problem.
Questions
1 Given the 1000's of documents and three tables per doc,
what is the best approach? (ie VBA program or another way?)
2 How difficult is getting the min requirements out to a record?
3 How difficult or doable to getting the "It would be cool to's"?
4 Looks like I would need several sub routines -
anyone have sample code? (ie getting multiple word docs,
Extracting from muliple tables, find flag and get next cell
of data, find flag and date data in same cell, Etc)
5 Anybody do any of this before and willing to consult to make
this successful?
Thanks
Jeff
Example Document (I put !____!____!__ to define the table lines)
_________________________________________________________________
Seven line standard text title (ie Co name and address)
_________________________________________________________________
Order Number: 11222223 ! Order Date: Sat Sep 17 09:21:19 2005 !
_______________________!_______________________________________!
_______________________________________________________________
! Qnty Item Numb Description Price Extended
1 S111-60 (0.1) Product 1 url link $11.11 $11.11
1 S222-60 (0.1) Product 2 url link $11.22 $11.22
1 S333-60 (0.1) Product 3 url link $11.33 $11.33
1 S444-60 (0.1) Product 4 url link $11.44 $11.44
Sub Tot: $45.10
Taxes $0.0
USA - Priority Ship: $5.00
Total: $50.10
!___!___!______________!_____________________________!_______!______!__!
___________________________________
name John Doe !
add1 9999 Some Street
city Any Town
state Statename
zip 99999
country US
phone1 (111)222-9999
email (e-mail address removed)
shipstate Someplace
shipcnty US
shipping byratechart
shiprate USA
!________!________________________!
database
I have thousands of word document invoices that I need to extract
about 25 pieces of data anD create an access database.
The Docs are the same format as shown below and the flags will be
the same as the field name in the DB and each Doc will be a record.
The document has 7 lines of standard text as a title,
then Three tables seperated by line.
Table 1 - 1row x 2cols - no col titles
Need to extract Order Number and Order Date
Note the flag and data are in the same cell for these two
(it would be cool to Split Order Date
into Day, Date and Time fields)
Table 2 - 7-12 Rows x 7 Col - Row could have 1-5products -
Cols first and last are blank
Min Need is the Total: (Bottom of col 6)
(it would be cool to create Five product fields and record
up to the first 5 Item numbers ordered - Col 3)
Table 3 - 15-20 rows x 2 cols - First row is the flags for the
fields and the second row in the data (text)
Missing flags are blank (ie add2 not listed)
All the data flags are predefined,
unque and always the same in the docs
I'm new to VBA, so I don't know if this is easy to do,
seaching the groups I see that people have done similar things,
just not exactly my problem.
Questions
1 Given the 1000's of documents and three tables per doc,
what is the best approach? (ie VBA program or another way?)
2 How difficult is getting the min requirements out to a record?
3 How difficult or doable to getting the "It would be cool to's"?
4 Looks like I would need several sub routines -
anyone have sample code? (ie getting multiple word docs,
Extracting from muliple tables, find flag and get next cell
of data, find flag and date data in same cell, Etc)
5 Anybody do any of this before and willing to consult to make
this successful?
Thanks
Jeff
Example Document (I put !____!____!__ to define the table lines)
_________________________________________________________________
Seven line standard text title (ie Co name and address)
_________________________________________________________________
Order Number: 11222223 ! Order Date: Sat Sep 17 09:21:19 2005 !
_______________________!_______________________________________!
_______________________________________________________________
! Qnty Item Numb Description Price Extended
1 S111-60 (0.1) Product 1 url link $11.11 $11.11
1 S222-60 (0.1) Product 2 url link $11.22 $11.22
1 S333-60 (0.1) Product 3 url link $11.33 $11.33
1 S444-60 (0.1) Product 4 url link $11.44 $11.44
Sub Tot: $45.10
Taxes $0.0
USA - Priority Ship: $5.00
Total: $50.10
!___!___!______________!_____________________________!_______!______!__!
___________________________________
name John Doe !
add1 9999 Some Street
city Any Town
state Statename
zip 99999
country US
phone1 (111)222-9999
email (e-mail address removed)
shipstate Someplace
shipcnty US
shipping byratechart
shiprate USA
!________!________________________!