Extracting a numbers from a text string

D

Duke Carey

Using some helper columns, these formulas worked on all 3 of your examples

B1: =RIGHT(A1,LEN(A1)-5-SEARCH("\",SUBSTITUTE(A1,"/","\",4)))
C1: =IF(ISERROR(VALUE(LEFT(B1,1))),RIGHT(B1,LEN(B1)-SEARCH(" ",B1)),B1)
D1: =RIGHT(C1,LEN(C1)-SEARCH("\",SUBSTITUTE(C1," ","\",3)))
E1: =--LEFT(D1,SEARCH(" ",D1)-1)

If you sepnd some time, you can probably next some of these. Better yet,
maybe Harlan Grove will jump in here with an elegant solution. He's a genius
at this sort of stuff


IPerlovsky said:
Yes, price, benchmark yield, and bond yield will always be before spread and
it will occasionally have a guarantor name that will not have a space.
 
I

IPerlovsky

Ron,

Thank you for the tip. I will study these functions to obtain a more
thorough understanding of the application's potential.
 
H

Harlan Grove

IPerlovsky said:
We are getting closer. This one would not work because the numbers
are not always going to be between the 14th and 15th spaces.

Here are some examples of where it would not work:
....

So far you've provided several examples, namely,

A 544646-BA CALIF UN1MM+ 5 7/1/2017 FSA 108.579 3.90 3.82 8 Aaa AAA

1/4/2007 CA 544646-BH LOS ANGELES CALIF UN 1000000 5 7/1/2022 FGIC
108.151 3.96 3.86 10 Aaa AAA Y 7/1/2016

1/8/2007 CA 544644-NC LOS ANGELES CALIF UN 1000000 5.75 7/1/2016 MBIA
115.907 3.74 3.67 7 Aaa AAA N 7/1/2016

1/8/2007 CA 544644-NC LOS ANGELES CALIF UN 1000000 5.75 7/1/2016 MBIA
115.907 3.74 3.67 17 Aaa AAA N 7/1/2016

12/15/2006 CA 544644-L6 LOS ANGELES CALIF UN1MM+ 5 7/1/2018 AMBAC
108.748 3.79 3.71 8 Y Aaa AAA Y 7/1/2015

2/6/2007 CA 544646-BA LOS ANGELES CALIF UN1MM+ 5 7/1/2017 FSA 108.579
3.90 3.82 8 Aaa AAA Y 7/1/2016

These don't all follow the same format, so simple parsing rules won't
work. However, it APPEARS that what MIGHT work would be to ensure that
the last token is nonnumeric, then find the last numeric token, where
tokens are any substrings containing no spaces. If that spec holds,
then define the name seq referring to

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))

and if one of these records were in cell B1, try the array formula

=VLOOKUP(9.99999999999999E+307,--MID(TRIM(B1)&"#",
SMALL(IF(MID(" "&TRIM(B1),seq,1)=" ",seq),seq),
SMALL(IF(MID(TRIM(B1)&"# ",seq,1)=" ",seq),seq)
-SMALL(IF(MID(" "&TRIM(B1),seq,1)=" ",seq),seq)),1)

Given the 6 example records above, this formula returns

8
10
7
17
8
8

The trick here is that the TRIM(B1)&"# " terms ensure that the last
token converts to an error rather than a number. The MID call returns
an array of the separate tokens in B1 followed by a lot of #NUM!
errors after the final token. The -- converts the numeric tokens into
numbers and the other tokens into error values, and the VLOOKUP call
given its 1st arg and missing 4th arg returns the last number found in
col 1 of its 2nd arg, the --MID(...) array.

Given all this, you'd be better off following Ron Rosenfeld's advice
about using Laurent Longre's MOREFUNC.XLL add-in.
 
R

Ron Rosenfeld

Ron,

Thank you for the tip. I will study these functions to obtain a more
thorough understanding of the application's potential.

You're welcome. It works on all of the examples you've given.

=REGEX.MID(A1,"(?<=\s)-?\d+(?=\s)",-1)

The Regular Expression is the section within the double quote marks.

Translation:

(?<=\s)

Look for a space (but don't extract it)

followed by

-?

a minus sign which is optional

followed by

\d+

a string of digits

followed by

(?=\s)

a space (but don't extract it).

The -1 at the end says make this the very last sequence in the string that
meets this description.

The following formula returns the same values, but, since we TRIM the spaces
instead of altering the regular expression so as not to return the spaces,
might be easier to understand:

=TRIM(REGEX.MID(A1,"\s-?\d+\s",-1))

The pattern that matches the regex would be:

<space><optional minus sign><series of digits><space>
\s -? \d+ \s



--ron
 
I

IPerlovsky

thanks, advice taken...
--
iperlovsky


Harlan Grove said:
....

So far you've provided several examples, namely,

A 544646-BA CALIF UN1MM+ 5 7/1/2017 FSA 108.579 3.90 3.82 8 Aaa AAA

1/4/2007 CA 544646-BH LOS ANGELES CALIF UN 1000000 5 7/1/2022 FGIC
108.151 3.96 3.86 10 Aaa AAA Y 7/1/2016

1/8/2007 CA 544644-NC LOS ANGELES CALIF UN 1000000 5.75 7/1/2016 MBIA
115.907 3.74 3.67 7 Aaa AAA N 7/1/2016

1/8/2007 CA 544644-NC LOS ANGELES CALIF UN 1000000 5.75 7/1/2016 MBIA
115.907 3.74 3.67 17 Aaa AAA N 7/1/2016

12/15/2006 CA 544644-L6 LOS ANGELES CALIF UN1MM+ 5 7/1/2018 AMBAC
108.748 3.79 3.71 8 Y Aaa AAA Y 7/1/2015

2/6/2007 CA 544646-BA LOS ANGELES CALIF UN1MM+ 5 7/1/2017 FSA 108.579
3.90 3.82 8 Aaa AAA Y 7/1/2016

These don't all follow the same format, so simple parsing rules won't
work. However, it APPEARS that what MIGHT work would be to ensure that
the last token is nonnumeric, then find the last numeric token, where
tokens are any substrings containing no spaces. If that spec holds,
then define the name seq referring to

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,255,1))

and if one of these records were in cell B1, try the array formula

=VLOOKUP(9.99999999999999E+307,--MID(TRIM(B1)&"#",
SMALL(IF(MID(" "&TRIM(B1),seq,1)=" ",seq),seq),
SMALL(IF(MID(TRIM(B1)&"# ",seq,1)=" ",seq),seq)
-SMALL(IF(MID(" "&TRIM(B1),seq,1)=" ",seq),seq)),1)

Given the 6 example records above, this formula returns

8
10
7
17
8
8

The trick here is that the TRIM(B1)&"# " terms ensure that the last
token converts to an error rather than a number. The MID call returns
an array of the separate tokens in B1 followed by a lot of #NUM!
errors after the final token. The -- converts the numeric tokens into
numbers and the other tokens into error values, and the VLOOKUP call
given its 1st arg and missing 4th arg returns the last number found in
col 1 of its 2nd arg, the --MID(...) array.

Given all this, you'd be better off following Ron Rosenfeld's advice
about using Laurent Longre's MOREFUNC.XLL add-in.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top