- Joined
- Sep 8, 2016
- Messages
- 12
- Reaction score
- 1
Short on time and long on the pile of files that need extractions - Used to have a Sub that would do something of this sort many years ago but can't locate it - or any other posts that are similar. Hope a wiz on here can assist as this is an awesome help site!
Need Excel to extract content from Word documents and place it into 4 columns like this:
Column C..........................Col D............Col E.................Col F...........
Field Name......................Condition......Rule Name.........Output.........
ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF
FYI:
*Do not need the code to open any files
(User will already have open 1 Excel file (ExampleExtractor.xlsm) and 1 Word file (LogRuleSourceXX.doc) open before running the SUB.
User will open ea Word file one at a time then Run the Sub from Excel
There's a large folder full of Word docs to extract from - so I'm needing a sub like this again to expedite ongoing extractions).
Here's the 4 Basic Rules the Sub Should Accommodate:
1== If Excel finds one of these rule words in All Caps [IF,AND,OR,PERFORM,THRU] in the Word doc content, then copy the content that sits to the immediate RIGHT of that rule word & paste into Excel column C. (copy until a space occurs I think will work).
Here's what the WORD DOCUMENT content format looks like:
-------------------------------------------------------------------------------------
IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
(WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
((WS-DOC-ID = 'D6A') AND
(WS-TT = 'N' OR SPACES))
MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
-------------------------------------------------------------------------------------
==The content is not in tables. One, any or all of the RULE words could potentially be present within a Word doc.
==With each FIND occurrence, paste the extracted content on it's own row within the Excel table Col C (below).
Column C..........................Col D............Col E.....................Col F...........
Field Name......................Condition......Rule Name.............Output.........
ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF
WS-RECORD-CD.............4..................AA.....................ACN-KEY-SUF
WS-BGCD........................8..................AA.....................ACN-KEY-SUF
WS-DOC-ID.....................D7A..............AA.....................ACN-KEY-SUF
WS-TT.............................N..................AA......................ACN-KEY-SUF
WS-DOC-ID.....................D6R..............AA.....................ACN-KEY-SUF
WS-TT.............................T...................AA.....................ACN-KEY-SUF
WS-DOC-ID.....................D6A...............AA....................ACN-KEY-SUF
WS-TT.............................N...................AA....................ACN-KEY-SUF
WS-TT............................SPACES........AA....................ACN-KEY-SUF
or can do it like this:
WS-TT............................N,SPACES.....AA...................ACN-KEY-SUF
(for example, LOOK ABOVE to the WORD CONTENT CHUNK:
"WS-TT" appears more than once (3xs),
so it gets listed each time it appears on it's own row in EXCEL, along w/ whatever Condition Code is listed..
(but here's where it gets tricky).......,
The 3rd time it is listed, there's a 'N' code in single quotes, (and) there's an "OR" present followed by a code SPACES.
(SPACES doesn't have single quote marks around it like the code on the LEFT side of "OR" has - so a rule that says pick up both and either record them on 2 individual rows --
(or) if easier, list both N,SPACES on a single line like shown in the above Excel table (either is great):
2== Then, pick up (copy/paste) the content that sits on the immediate RIGHT of the equals sign into Excel column D.
(sometimes that Condition Code content is in quotes - sometimes it's not - so not using quotes in the definition is probably more accurate (to just say in the code to pick up everything to the RIGHT of the equals sign on that same line until a space or end of line occurs) - and I'll parse off any extra garbage that I don't need if something gets picked up)
3== Need the code to look at the Filename of the Word doc and extract the 12th+13th char position, paste that into Col E of Excel
-------------------------------------------------------------------------------------------
the 'Rule Name' is simply 2 characters of the Word doc file name repeated all the way down,
....(if too hard to do or time consuming for more code, just pick up the full file name, that's fine!)
--------------------------------------------------------------------------------------------
4== Last, the code should locate the "Output" content which always follows the rule word: [TO], copy/paste into Col F of Excel
The 'Output' is re-used as it is relational (meaning all the line items listed in that little chunk all tie to/relate to the 'Output' name listed at the end of that chunk signified by preceding the word "TO": (see below)
--------------------------------------------------------------------------------------------
IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
(WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
((WS-DOC-ID = 'D6A') AND
(WS-TT = 'N' OR SPACES))
MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
---------------------------------------------------------------------------------------------
OTHER: (in the example below, following the "IF" rule word, there's 2 Field names:
..............PUBLIC-SALES-CODE
..............PUBLIC-TIV-12
----------------------------------------------------------------------------
WORD CONTENT EXAMPLE:
IF HOLD-REIMB = ‘A’ OR ‘Y’<<<<<<<<<<<<< A & Y would get pasted into Col C either together or individually
IF PUBLIC-SALES-CODE AND PUBLIC-TIV-12
MOVE ‘12’ TO ACH-KEY-IND-VAL
----------------------------------------------------------------------------
(each would be listed on their own rows within the Excel table and both are associated to the ACH-KEY-IND-VAL Output)
Unlike the other IF's.. these PUBLICs don't have a code so, we are to use: "YES" or "NO" as the codes.
"NO" if the word "NOT" precedes it: NOT PUBLIC-SALES-CODE
"YES" alternately
(see the final table example at the bottom of this post to see how they would be laid into the Excel table)
That's it!
(I threw those RULE WORDS into Col A of the provided Excel file (cross-posted) as an idea that the code could look to that column when performing it's LOOKUP/INDEX - but it's probably much better to just hard code the rules into code and manage them there)
Forever in debt to you if you can figure out a decent sub to expedite this painful process...
Thanks, Chris
cross-posted to be able to upload/attach a sample Excel file: http://www.ozgrid.com/forum/showthread.php?t=202211&p=782727#post782727
THE GOAL IS: to use EXCEL to extract content from WORD docs; achieved in the fastest, most efficient way possible... if it's a little messy with picking up a little extra garb at the end of a line, so-be-it- as long as I can collect all the content needed from the Word doc -- I can do some cleaning then pass it on to programmers in a table such as is shown below.
Column C............................Col D..........Col E..............Col F...........
Field Name.........................Condition......Rule Name........Output.........
ACJ-TRANS-LR-CODE...........05..............AA..............ACN-KEY-SUF
WS-DOC-ID.............................F7A............AA..............ACN-KEY-SUF
WS-RECORD-CD......................4...............AA..............ACN-KEY-SUF
WS-DOC-NR1..........................M...............AA..............ACN-KEY-SUF
WS-RECORD-CD.....................4................AA..............ACN-KEY-SUF
WS-BGCD................................8................AA..............ACN-KEY-SUF
WS-DOC-ID.............................D7A............AA..............ACN-KEY-SUF
WS-TT......................................N................AA..............ACN-KEY-SUF
WS-DOC-ID.............................D6R............AA..............ACN-KEY-SUF
WS-TT......................................T.................AA..............ACN-KEY-SUF
WS-DOC-ID..............................D6A.............AA.............ACN-KEY-SUF
WS-TT......................................N.................AA.............ACN-KEY-SUF
PUBLIC-SALES-CODE............YES..............AA.............ACN-KEY-SUF
PUBLIC-TIV-12........................YES..............AA.............ACN-KEY-SUF
NOT-PUBLIC-SALES-CODE....NO...............AA.............ACN-KEY-SUF <<< "NOT" INDICATES "NO" Condition
------------------------------------------------------------------------------------------------
Hope that makes sense and THANKS AGAIN! - (sample file attachment is on cross-post link)
Need Excel to extract content from Word documents and place it into 4 columns like this:
Column C..........................Col D............Col E.................Col F...........
Field Name......................Condition......Rule Name.........Output.........
ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF
FYI:
*Do not need the code to open any files
(User will already have open 1 Excel file (ExampleExtractor.xlsm) and 1 Word file (LogRuleSourceXX.doc) open before running the SUB.
User will open ea Word file one at a time then Run the Sub from Excel
There's a large folder full of Word docs to extract from - so I'm needing a sub like this again to expedite ongoing extractions).
Here's the 4 Basic Rules the Sub Should Accommodate:
1== If Excel finds one of these rule words in All Caps [IF,AND,OR,PERFORM,THRU] in the Word doc content, then copy the content that sits to the immediate RIGHT of that rule word & paste into Excel column C. (copy until a space occurs I think will work).
Here's what the WORD DOCUMENT content format looks like:
-------------------------------------------------------------------------------------
IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
(WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
((WS-DOC-ID = 'D6A') AND
(WS-TT = 'N' OR SPACES))
MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
-------------------------------------------------------------------------------------
==The content is not in tables. One, any or all of the RULE words could potentially be present within a Word doc.
==With each FIND occurrence, paste the extracted content on it's own row within the Excel table Col C (below).
Column C..........................Col D............Col E.....................Col F...........
Field Name......................Condition......Rule Name.............Output.........
ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF
WS-RECORD-CD.............4..................AA.....................ACN-KEY-SUF
WS-BGCD........................8..................AA.....................ACN-KEY-SUF
WS-DOC-ID.....................D7A..............AA.....................ACN-KEY-SUF
WS-TT.............................N..................AA......................ACN-KEY-SUF
WS-DOC-ID.....................D6R..............AA.....................ACN-KEY-SUF
WS-TT.............................T...................AA.....................ACN-KEY-SUF
WS-DOC-ID.....................D6A...............AA....................ACN-KEY-SUF
WS-TT.............................N...................AA....................ACN-KEY-SUF
WS-TT............................SPACES........AA....................ACN-KEY-SUF
or can do it like this:
WS-TT............................N,SPACES.....AA...................ACN-KEY-SUF
(for example, LOOK ABOVE to the WORD CONTENT CHUNK:
"WS-TT" appears more than once (3xs),
so it gets listed each time it appears on it's own row in EXCEL, along w/ whatever Condition Code is listed..
(but here's where it gets tricky).......,
The 3rd time it is listed, there's a 'N' code in single quotes, (and) there's an "OR" present followed by a code SPACES.
(SPACES doesn't have single quote marks around it like the code on the LEFT side of "OR" has - so a rule that says pick up both and either record them on 2 individual rows --
(or) if easier, list both N,SPACES on a single line like shown in the above Excel table (either is great):
2== Then, pick up (copy/paste) the content that sits on the immediate RIGHT of the equals sign into Excel column D.
(sometimes that Condition Code content is in quotes - sometimes it's not - so not using quotes in the definition is probably more accurate (to just say in the code to pick up everything to the RIGHT of the equals sign on that same line until a space or end of line occurs) - and I'll parse off any extra garbage that I don't need if something gets picked up)
3== Need the code to look at the Filename of the Word doc and extract the 12th+13th char position, paste that into Col E of Excel
-------------------------------------------------------------------------------------------
the 'Rule Name' is simply 2 characters of the Word doc file name repeated all the way down,
....(if too hard to do or time consuming for more code, just pick up the full file name, that's fine!)
--------------------------------------------------------------------------------------------
4== Last, the code should locate the "Output" content which always follows the rule word: [TO], copy/paste into Col F of Excel
The 'Output' is re-used as it is relational (meaning all the line items listed in that little chunk all tie to/relate to the 'Output' name listed at the end of that chunk signified by preceding the word "TO": (see below)
--------------------------------------------------------------------------------------------
IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
(WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
((WS-DOC-ID = 'D6A') AND
(WS-TT = 'N' OR SPACES))
MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
---------------------------------------------------------------------------------------------
OTHER: (in the example below, following the "IF" rule word, there's 2 Field names:
..............PUBLIC-SALES-CODE
..............PUBLIC-TIV-12
----------------------------------------------------------------------------
WORD CONTENT EXAMPLE:
IF HOLD-REIMB = ‘A’ OR ‘Y’<<<<<<<<<<<<< A & Y would get pasted into Col C either together or individually
IF PUBLIC-SALES-CODE AND PUBLIC-TIV-12
MOVE ‘12’ TO ACH-KEY-IND-VAL
----------------------------------------------------------------------------
(each would be listed on their own rows within the Excel table and both are associated to the ACH-KEY-IND-VAL Output)
Unlike the other IF's.. these PUBLICs don't have a code so, we are to use: "YES" or "NO" as the codes.
"NO" if the word "NOT" precedes it: NOT PUBLIC-SALES-CODE
"YES" alternately
(see the final table example at the bottom of this post to see how they would be laid into the Excel table)
That's it!
(I threw those RULE WORDS into Col A of the provided Excel file (cross-posted) as an idea that the code could look to that column when performing it's LOOKUP/INDEX - but it's probably much better to just hard code the rules into code and manage them there)
Forever in debt to you if you can figure out a decent sub to expedite this painful process...
Thanks, Chris
cross-posted to be able to upload/attach a sample Excel file: http://www.ozgrid.com/forum/showthread.php?t=202211&p=782727#post782727
THE GOAL IS: to use EXCEL to extract content from WORD docs; achieved in the fastest, most efficient way possible... if it's a little messy with picking up a little extra garb at the end of a line, so-be-it- as long as I can collect all the content needed from the Word doc -- I can do some cleaning then pass it on to programmers in a table such as is shown below.
Column C............................Col D..........Col E..............Col F...........
Field Name.........................Condition......Rule Name........Output.........
ACJ-TRANS-LR-CODE...........05..............AA..............ACN-KEY-SUF
WS-DOC-ID.............................F7A............AA..............ACN-KEY-SUF
WS-RECORD-CD......................4...............AA..............ACN-KEY-SUF
WS-DOC-NR1..........................M...............AA..............ACN-KEY-SUF
WS-RECORD-CD.....................4................AA..............ACN-KEY-SUF
WS-BGCD................................8................AA..............ACN-KEY-SUF
WS-DOC-ID.............................D7A............AA..............ACN-KEY-SUF
WS-TT......................................N................AA..............ACN-KEY-SUF
WS-DOC-ID.............................D6R............AA..............ACN-KEY-SUF
WS-TT......................................T.................AA..............ACN-KEY-SUF
WS-DOC-ID..............................D6A.............AA.............ACN-KEY-SUF
WS-TT......................................N.................AA.............ACN-KEY-SUF
PUBLIC-SALES-CODE............YES..............AA.............ACN-KEY-SUF
PUBLIC-TIV-12........................YES..............AA.............ACN-KEY-SUF
NOT-PUBLIC-SALES-CODE....NO...............AA.............ACN-KEY-SUF <<< "NOT" INDICATES "NO" Condition
------------------------------------------------------------------------------------------------
Hope that makes sense and THANKS AGAIN! - (sample file attachment is on cross-post link)