Parsing Text When Only One Character Can Be Used To Search

L

LHSallwasser

Hello:
I've spent a couple of hours looking on this site, and have found some
excellent new tips, but still need to ask for assistance. I have a series of
text rows containing one string that I need to parse. There are no
characters I can use as delimeters (inconsistent), and there is no consistent
use of text qualifiers; it's in effect, free form text. Here's a sample
record
------------------------------------------------------------------------------------------
FROM per_requisitions_v@appslink req2, per_vacancies@appslink vac2,
What I need to pull from this string is
------------------------------------------------------------------------------------------
per_requisitions_v@appslink

AND

per_vacancies@appslin
------------------------------------------------------------------------------------------
The only character I can use is the @ symbol; it's what tells me that a
record should be reviewed and contains the desired data. But I need the text
before and after this character. Standard worksheet functions SEARCH and
FIND do not accept a negative start_num value; they won't search and retreive
in reverse. I would greatly appreciate suggestions for pulling this data;
there are thousands of rows, even when I isolate those containing the desired
data. Some of the rows have multiple occurences, and all have much
extraneous data I want to remove.

Thank you for your help,

LHSallwasser
 
B

Biff

Hi!

I notice that each substring starts with "per". Is that a constant?

There's also "req2" and "vac2" after each substring. Is that a constant?

What about the "appslink" after each <at> sign. Are those constants?

How about posting several samples so we can see if we find a pattern.

Biff
 
D

Dave Peterson

One way is to use a UserDefinedFunction.

If you want to try...

Option Explicit
Function ExtractAddr(iStr As String) As Variant

Dim iCtr As Long
Dim aCtr As Long
Dim mySplit As Variant
Dim myArr() As Variant

With Application
iStr = .Trim(.Substitute(iStr, ",", " "))
End With
mySplit = Split97(iStr, " ")
aCtr = 0
For iCtr = LBound(mySplit) To UBound(mySplit)
If InStr(1, mySplit(iCtr), "@", vbTextCompare) Then
aCtr = aCtr + 1
ReDim Preserve myArr(1 To aCtr)
myArr(aCtr) = mySplit(iCtr)
End If
Next iCtr

If aCtr = 0 Then
ExtractAddr = ""
ElseIf Application.Caller.Rows.Count = 1 Then
ExtractAddr = myArr
Else
ExtractAddr = Application.Transpose(myArr)
End If

End Function
Function Split97(sStr As String, sdelim As String) As Variant
'from Tom Ogilvy
Split97 = Evaluate("{""" & _
Application.Substitute(sStr, sdelim, """,""") & """}")
End Function

This goes in a general module.

If you're new to macros, you may want to read David McRitchie's intro at:
http://www.mvps.org/dmcritchie/excel/getstarted.htm


Then you can use it like:

Selecting as many cells (in a row or in a column) that you think you need:

then type:
=extractaddr(A1)

but hit ctrl-shift-enter to fill that selection with this array formula.

If you selected too many cells, you'll get #n/a's back at the end.
 
C

CLR

Of course as Biff suggests, it would be easier with more data to consider,
but try these in the meantime.........

=MID(A1,FIND("per_requisitions_v@",A1,1),LEN(A1)-FIND("req2",A1,1)-6)

=MID(A1,FIND("req2",A1,1)+6,LEN(A1)-FIND("req2",A1,1)-11)

Vaya con Dios,
Chuck, CABGx3
 
L

LHSallwasser

Hello:
Thank you Biff and Chuck for your replies; the problem is that there is no
pattern before the @ symbol in the data, so I could not use your suggestions
or the seeded Excel functions.

Dave - thank you so much for the user defined function! It's exactly what I
needed and works to extract the "before@after" string from the mess of
PL/SQL. I've been reading through David McRitchie's Excel page, and used his
CountOccurrences function. It's great that you and other gurus take the time
to help the not-so-gifted rest of us.

Best regards,

LHollister Sallwasser
 
R

Ron Rosenfeld

Hello:
I've spent a couple of hours looking on this site, and have found some
excellent new tips, but still need to ask for assistance. I have a series of
text rows containing one string that I need to parse. There are no
characters I can use as delimeters (inconsistent), and there is no consistent
use of text qualifiers; it's in effect, free form text. Here's a sample
record:
------------------------------------------------------------------------------------------
FROM per_requisitions_v@appslink req2, per_vacancies@appslink vac2,
What I need to pull from this string is:
------------------------------------------------------------------------------------------
per_requisitions_v@appslink

AND

per_vacancies@appslink
------------------------------------------------------------------------------------------
The only character I can use is the @ symbol; it's what tells me that a
record should be reviewed and contains the desired data. But I need the text
before and after this character. Standard worksheet functions SEARCH and
FIND do not accept a negative start_num value; they won't search and retreive
in reverse. I would greatly appreciate suggestions for pulling this data;
there are thousands of rows, even when I isolate those containing the desired
data. Some of the rows have multiple occurences, and all have much
extraneous data I want to remove.

Thank you for your help,

LHSallwasser

Simple with regular expressions.

What you apparently want is the word (with underscores) preceding the '@' and
the word following it.

If you download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr/

then the expression:

=REGEX.MID(A1,"\S+@\w+",1)

will pull out the first

and

=REGEX.MID(A1,"\S+@\w+",2)

will pull out the second.

Note that the @ character is causing the regex in the above formulas to be
underlined. That should not really happen in Excel.
--ron
 
D

DawnTreader

Hello

i have a question along the same lines and so i was hoping that posting in
this thread would be ok.

i have a refreshable web query where there are lines of text like this:

Hitachi Deskstar 180GXP 80GB IDE ATA100 7200RPM 2MB 8.8MS Hard Drive 3 Year
MFR Warranty

what i would like to know how to do is get the 80GB out of that test and all
the GB amounts out of the other lines with a formula.

what i am trying to do is create a way for me to see what drive is the best
for the price and i need to get to a numerical value to calculate on.

so first i would have to pull the text for 80GB. then turn it in to a
numerical value. if that takes 2 columns on my sheet that is ok, but the
first part absolutely stumps me.

if anyone can help it would be appreciated.
 
R

Ron Coderre

This is what I came up with...

For text in A1
B1:
=RIGHT(LEFT(A1,FIND("GB",A1)+1),LEN(LEFT(A1,FIND("GB",A1)+1))-LOOKUP(LEN(LEFT(A1,FIND("GB",A1)+1)),FIND("
",LEFT(A1,FIND("GB",A1)+1),ROW(INDEX($A:$A,1,1):INDEX($A:$A,LEN(LEFT(A1,FIND("GB",A1)+1)),1)))))

Using your sample text
A1: Hitachi Deskstar 180GXP 80GB IDE ATA100 7200RPM 2MB 8.8MS Hard Drive 3
Year MFR Warranty

That formula returns "80GB" in B1

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro
 
D

DawnTreader

Hello

WOW. thanks. it works great. i am going to have to study it a little bit to
fully understand it, but it works great.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top