parse text document

H

HammerJoe

I wonder if anyone could get me started on how to parse the text body
of an email.

The body looks like this:


--------------------|-----------|-----------|-----------|-----------|------­-----|-----------|-----------|-------------
|Sunday |Monday |Tuesday |Wednesday |
Thursday |Friday |Saturday |
|03/09/08 |03/10/08 |03/11/08 |03/12/08 |
03/13/08 |03/14/08 |03/15/08 |
--------------------|-----------|-----------|-----------|-----------|------­-----|-----------|-----------|-------------
7550 |OFF | 0930-1730| 0900-1700| 0900-1700|
0930-1730| 0930-1730|OFF |
Joe, Hammer |L 1400-1500|T 0900-1300|T 0900-1300|L 1400-1500|
L 1400-1500| |
| | |L 1300-1400|L
1300-1400| | | |
| | |T 1400-1700|T
1400-1700| | | |
--------------------|-----------|-----------|-----------|-----------|------­-----|-----------|-----------|-------------

(I apologize if it doesnt show correctly, but the lines are all
aligned.)

The email contains a list of schedules for several people.
I can get the email and treat as a text file.

The first 'column' is dependent on the longest name of all the
individuals, the weekly 'columns' are static in size but as you can
see they can have several 'rows'.
These events are triggered by a letter code in front of the hour.
I need to parse each person and find any special events and make a
table out of them

In my example I need to look for all the times that have a 'T', for ie
Tuesday 0900-1300 and 1400-1700 is what I need to collect.

I have been playing with this for awhile and all I come up is a messy
spaghetti code and I not getting anywhere.
My difficulty is that the info is not located in the same place but it
is dynamic the #of rows are different, the starting coloumn is
different, etc.
I would appreciate any help that would get me started on this.
Thanks.
 
K

Klaus Linke

Hi Joe,

Let's do it in the user interface first...
One way to skin the cat:

Select the text and choose from the menu "Table > Convert... > Text to
table".

A dialog window will come up.

Probably the separator will already be selected as "Other: |", and you can
click "OK".

The table should look pretty good already. You can delete the rows that have
just hyphens (by hand for now).

And you can delete the excess spaces by selecting all cells, but without the
end-of-row-markers(! ... else, you'd center the table in the next step).
Now center the text (Ctrl+E or the button on the formatting toolbar), then
left-align (Ctrl+L), and the spaces should be gone.


In a macro, you could follow pretty much the same steps.

You might delete paragraphs that contain just hyphens and pipes (- and |)
first.

And in a macro, you might search for whitespace in front and after a pipe
and delete it before converting the text to a table, rather than using the
center/left alignment trick.

But there's hardly a better or faster way to create the table than
"ConvertToTable".
If you use the macro recorder, you might want to clean up the code a bit
though.
Say, .ConvertToTable doesn't really need to know how many rows the table
will have, so you may delete "NumRows := ##," to keep the macro more
independent of the text you run it on.

Post back if you have trouble implementing any particular step!
Klaus



I wonder if anyone could get me started on how to parse the text body
of an email.

The body looks like this:


--------------------|-----------|-----------|-----------|-----------|------­-----|-----------|-----------|-------------
|Sunday |Monday |Tuesday |Wednesday |
Thursday |Friday |Saturday |
|03/09/08 |03/10/08 |03/11/08 |03/12/08 |
03/13/08 |03/14/08 |03/15/08 |
--------------------|-----------|-----------|-----------|-----------|------­-----|-----------|-----------|-------------
7550 |OFF | 0930-1730| 0900-1700| 0900-1700|
0930-1730| 0930-1730|OFF |
Joe, Hammer |L 1400-1500|T 0900-1300|T 0900-1300|L 1400-1500|
L 1400-1500| |
| | |L 1300-1400|L
1300-1400| | | |
| | |T 1400-1700|T
1400-1700| | | |
--------------------|-----------|-----------|-----------|-----------|------­-----|-----------|-----------|-------------

(I apologize if it doesnt show correctly, but the lines are all
aligned.)

The email contains a list of schedules for several people.
I can get the email and treat as a text file.

The first 'column' is dependent on the longest name of all the
individuals, the weekly 'columns' are static in size but as you can
see they can have several 'rows'.
These events are triggered by a letter code in front of the hour.
I need to parse each person and find any special events and make a
table out of them

In my example I need to look for all the times that have a 'T', for ie
Tuesday 0900-1300 and 1400-1700 is what I need to collect.

I have been playing with this for awhile and all I come up is a messy
spaghetti code and I not getting anywhere.
My difficulty is that the info is not located in the same place but it
is dynamic the #of rows are different, the starting coloumn is
different, etc.
I would appreciate any help that would get me started on this.
Thanks.
 
K

Klaus Linke

Maybe I read too quickly... and concentrated too much on the "make a table
out of them" rather than the "parsing"?

Anyway, if you need to fetch info from the text file, it is helpful to get
it into a clean table or array first.

For the table see the other post...
For the array, you might try something like the code below.

I don't try to delete the "empty" rows in that code, but you could likely
just ignore them later on.
And I don't try the parsing part, but once you have the things in the array,
it shouldn't be very hard to go through all the cells in all the rows,
determine what kind it is -- empty, a weekday, a date, use the Like operator
for something Like "T [0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]" or
whatever -- and fetch what you need.

Regards,
Klaus


Dim vArray() As Variant
Dim vText As Variant

' get text...
vText = ActiveDocument.Content.Text

' rows...
vText = Split(vText, vbCr)

' cells
ReDim vArray(UBound(vText))
Dim iRow As Long, iCol As Long
For iRow = LBound(vArray) To UBound(vArray)
vArray(iRow) = CVar(Split(vText(iRow), "|"))
Next iRow

' You might clean up the text with Trim, and delete the hyphens:
For iRow = LBound(vArray) To UBound(vArray)
For iCol = LBound(vArray(iRow)) To UBound(vArray(iRow))
vArray(iRow)(iCol) = Trim(vArray(iRow)(iCol))
If Replace(vArray(iRow)(iCol), "-", "") = "" Then
vArray(iRow)(iCol) = ""
End If
Next iCol
Next iRow

' Testing: output in Debug.window
For iRow = LBound(vArray) To UBound(vArray)
For iCol = LBound(vArray(iRow)) To UBound(vArray(iRow))
Debug.Print iRow, iCol, vArray(iRow)(iCol)
Next iCol
Next iRow
 
H

HammerJoe

Klaus,

That was brilliant.
Thank you sir for the excelent help.
I have to say that it took me awhile looking at the code to understand
what it does. :)
But it is exactly what I was looking for.
Excelent, thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top