Extracting Info From Within A Text String

G

Guest

I am trying to parse data but with limited success.

Within an imported file I have data like this:


26/07/2006 MT.:004367 CAPITAL CITY COUNCIL 225200
Ref:200060


29/04/2006 West Country Waterworks Co Inv:7097932959 O/No:2252005207
Trans. ID.:00000505


06/11/2006 Conway Cleaning Inv:StM002L Non Order Trans. ID.:00000619

22/06/2006 E A Gangle & Partners 00000527 Crd. Nt. No.:4823a
Trans. ID.:00000524


12/06/2006 Wessex Purchasing O/No.:2252005217 Trans. ID.:00000218



where the date is in Col C and all the rest is in Col D

For the purposes of this we'll assume that the 5 rows of data are in rows 1
to 5 (if only it were that simple!)

I want to split the text in Col D into Cols E to I where:

Col E has the formula =if(Left(D2,3)="MT.",Left(D2,10),"") so it returns
MT.:004367 for the first row in this example and "" for the rest

and now it gets complicated....

Col F has a formula which returns the supplier so in the examples above it
would bring back CAPITAL CITY COUNCIL in row 1 and then West Country
Waterworks Co, Conway Cleaning, E A Gangle & Partners and Wessex Purchasing
in subsequent rows. So the formula needs to return everything to the left of
the first occurence of either Inv or 0000 or O/No unless the string starts
with MT. where it returns text from the 12th character in the string until
there are multiple spaces

Col G has a formula which returns Inv:and the following numbers until there
is a string of multiple spaces so rows 1,4 and 5 would be "", row 2 would
have Inv:7097932959 and row 3 would have Inv:StM002L

Col H has a formula which returns Crd. Nt. No.:and the following numbers
until there is a space so rows 1,2,3 and 5 would be "", row 4 would have Crd.
Nt. No.:4823a

Col I has a formula which returns "Non-Order" if "Non-Order" appears in the
string or O/No:and the following numbers until there is a space

Col J Returns Trans ID.:and the following digits

By using IF and SEARCH I can tell if and where say "Inv:" exists within the
text string but I can't see how to use this as the start point or setting the
stop point for extracting the following text so I'm really stuck now and
would appreciate some guidance.

Sorry for such a long post but many thanks in anticipation
 
C

CLR

Any way you cut it, there ain't no easy answer to this sort of thing because
of all the unknown variables.........but you might try something like this.
1- go ahead and use your original formula to populate column E
2- in column F, maybe something like this, to get rid of the MT thing off
the lead of some of the cells and have each start with a name......
=if(left(d2,3="MT.",mid(d2,12,99),d2)
3- and maybe something like this to pick up the end
=mid(f2,find(":",f2,1)-4,99)

hth
Vaya con Dios,
Chuck, CABGx3
 
J

John Bundy

Sorry its early for me, this is better, colon could be in non- invoice lines,
this finds all of inv:
=RIGHT(A1,(LEN(A1)-SEARCH("inv:",A1,1))-3)
 
H

Harlan Grove

nospaminlich said:
Within an imported file I have data like this:

26/07/2006 MT.:004367 CAPITAL CITY COUNCIL 225200
Ref:200060

29/04/2006 West Country Waterworks Co Inv:7097932959 O/No:2252005207
Trans. ID.:00000505

06/11/2006 Conway Cleaning Inv:StM002L Non Order Trans. ID.:00000619

22/06/2006 E A Gangle & Partners 00000527 Crd. Nt. No.:4823a
Trans. ID.:00000524

12/06/2006 Wessex Purchasing O/No.:2252005217 Trans. ID.:00000218 >
where the date is in Col C and all the rest is in Col D

For the purposes of this we'll assume that the 5 rows of data are in rows 1
to 5 (if only it were that simple!)

I want to split the text in Col D into Cols E to I where:

Col E has the formula =if(Left(D2,3)="MT.",Left(D2,10),"") so it returns
MT.:004367 for the first row in this example and "" for the rest

Note: looks like your first record is in row 2 rather than row 1.

Regular expressions would be the best tool for this. If you download
and install Laurent Longre's MOREFUNC.XLL add-in, freely available at

http://xcell05.free.fr/english/

you could get the same result with the formula

=REGEX.MID(D2,"MT\.:\S+")
and now it gets complicated....

Col F has a formula which returns the supplier so in the examples above it
would bring back CAPITAL CITY COUNCIL in row 1 and then West Country
Waterworks Co, Conway Cleaning, E A Gangle & Partners and Wessex Purchasing
in subsequent rows. So the formula needs to return everything to the left of
the first occurence of either Inv or 0000 or O/No unless the string starts
with MT. where it returns text from the 12th character in the string until
there are multiple spaces

=REGEX.MID(D2,"[A-Za-z][^ \t:]*(\s[^ \t:]+)*?(?=\s(\s|[^ \t:]+:|
\d{2}).*$)")
Col G has a formula which returns Inv:and the following numbers until there
is a string of multiple spaces so rows 1,4 and 5 would be "", row 2 would
have Inv:7097932959 and row 3 would have Inv:StM002L
=REGEX.MID(D2,"Inv:\S+")

Col H has a formula which returns Crd. Nt. No.:and the following numbers
until there is a space so rows 1,2,3 and 5 would be "", row 4 would have Crd.
Nt. No.:4823a

=REGEX.MID(D2,"Crd\. Nt\. No\.:\S+")
Col I has a formula which returns "Non-Order" if "Non-Order" appears in the
string or O/No:and the following numbers until there is a space

=IF(COUNTIF(D2,"*Non Order*"),"Non-Order",REGEX.MID(D2,"O/No:\S+"))
Col J Returns Trans ID.:and the following digits
....

=REGEX.MID(D2,"Trans\. ID\.:\S+")

This leaves unspecified what to do with Ref:# and substrings of
numerals not preceded by a tag and colon, e.g., 225200 in the first
record.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top