Parsing a String - Call to Ron Rosenfeld

J

Jim Berglund

Ron Rosenfeld kindly gave me the following:

Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

With ActiveSheet
Range("A:A").Select
Set rg = Selection
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To 5
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End With
End Sub

It works fine against the following data, creating 5 columns with the Name,
Address, City, Prov, and Telephone Number, respectively.

ABACO CHRISTIAN P 32 PATTERSON ME SW CALGARY, AB T3H2C7 (403) 238-2039
begin_of_the_skype_highlighting (403) 238-2039
end_of_the_skype_highlighting

'Just one tweak needed...

I'd like to separate the last name from the remainder of the name (which may
be anything from "P" to "DR QUINCY R & JULIE" and create 6 columns.

Would someone (preferable Ron), please explain to me how to do this?

Thanks, once again

Jim Berglund
 
R

Ron Rosenfeld

Ron Rosenfeld kindly gave me the following:

Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

With ActiveSheet
Range("A:A").Select
Set rg = Selection
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To 5
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End With
End Sub

It works fine against the following data, creating 5 columns with the Name,
Address, City, Prov, and Telephone Number, respectively.

ABACO CHRISTIAN P 32 PATTERSON ME SW CALGARY, AB T3H2C7 (403) 238-2039
begin_of_the_skype_highlighting (403) 238-2039
end_of_the_skype_highlighting

'Just one tweak needed...

I'd like to separate the last name from the remainder of the name (which may
be anything from "P" to "DR QUINCY R & JULIE" and create 6 columns.

Would someone (preferable Ron), please explain to me how to do this?

Thanks, once again

Jim Berglund

How do we know which name is the "last name" ?

In your earlier examples, it seemd to be the first word. If that is
the case, you can do it by modifying the Regex to split out the first
word from the rest.

Note in the following I used a somewhat different method to select the
range to parse, so as to reduce the number of cells to be processed.

======================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

Set rg = Range("A1") 'first cell in column
Set rg = Range(rg, Cells(Cells.Rows.Count, rg.Column).End(xlUp))
rg.Offset(0, 1).Resize(columnsize:=7).ClearContents

Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = _
"^(\D+?)\s(\D*)\s*(.*)\s(CALGARY|MELBOURNE|SYDNEY),?" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To myMatches(0).submatches.Count - 1
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c

End Sub
======================================
 
J

Jim Berglund

Thanks again, Ron.
By the way, if the city is a variable, City, and not a selection from a list
of known cities, could this be handled?

Jim

Ron Rosenfeld said:
Ron Rosenfeld kindly gave me the following:

Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

With ActiveSheet
Range("A:A").Select
Set rg = Selection
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "^(\D+)\s+(.*)\s(CALGARY|MELBOURNE|SYDNEY)" &
",?\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To 5
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End With
End Sub

It works fine against the following data, creating 5 columns with the
Name,
Address, City, Prov, and Telephone Number, respectively.

ABACO CHRISTIAN P 32 PATTERSON ME SW CALGARY, AB T3H2C7 (403) 238-2039
begin_of_the_skype_highlighting (403) 238-2039
end_of_the_skype_highlighting

'Just one tweak needed...

I'd like to separate the last name from the remainder of the name (which
may
be anything from "P" to "DR QUINCY R & JULIE" and create 6 columns.

Would someone (preferable Ron), please explain to me how to do this?

Thanks, once again

Jim Berglund

How do we know which name is the "last name" ?

In your earlier examples, it seemd to be the first word. If that is
the case, you can do it by modifying the Regex to split out the first
word from the rest.

Note in the following I used a somewhat different method to select the
range to parse, so as to reduce the number of cells to be processed.

======================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long

Set rg = Range("A1") 'first cell in column
Set rg = Range(rg, Cells(Cells.Rows.Count, rg.Column).End(xlUp))
rg.Offset(0, 1).Resize(columnsize:=7).ClearContents

Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = _
"^(\D+?)\s(\D*)\s*(.*)\s(CALGARY|MELBOURNE|SYDNEY),?" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To myMatches(0).submatches.Count - 1
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c

End Sub
======================================
 
R

Ron Rosenfeld

Thanks again, Ron.

You're welcome. Glad to help.
By the way, if the city is a variable, City, and not a selection from a list
of known cities, could this be handled?

You will need to determine a rule that unambiguously differentiates
the City from the rest of the string.

Unless you can do that, you will need a list.

In the examples you've provided in the past, plus knowing that some
cities consist of more than one word, there seems to be too much
variability to devise a rule.

---------------------------------------------------------
A A WOODS 1479 22RD STREET CALGARY AB T3H2C4 (403) 555-9999

AAINEY 1193 STRATHCONA DR SW CALGARY, AB T3H4S1 (403) 293-2671
(403) 293-2671

ABEN CONRAD 14 140 STRATHAVEN CI SW CALGARY, AB T3H2N5 (403) 249-1396
(403) 249-1396


ADAMS-CATHERWOOD T 71 STRATTON HILL RI SW CALGARY, AB T3H1W8 (403)
685-9699
(403) 685-9699


ADAMSON BRYCE P 330 555 STRATHCONA BV SW CALGARY, AB T3H2Z9 (403)
242-2000
(403) 242-2000


AIUTO ANTONIO 511 STRATHCONA ME SW CALGARY, AB T3H1X1 (403) 240-2055
(403) 240-2055


ALBINATI M J 115 STRADWICK RI SW CALGARY, AB T3H1G7 (403) 242-1028
(403) 242-1028


ALEXANDER RONALD R 120 STRATHBURY CI SW CALGARY, AB T3H1P9 (403)
249-2801
(403) 249-2801


AL-JANABY I 35 39 STRATHLE CM SW CALGARY, AB T3H5P8 (403) 256-5690
(403) 256-5690


ALLEN JACK 66 STRATHLEA CR SW CALGARY, AB T3H5A8 (403) 249-8289
(403) 249-8289


AL-SAYED M 11 STRATHLEA CM SW CALGARY, AB T3H5C5 (403) 663-8689
(403) 663-8689


ALSHAWI A 124 STRATHCONA CL SW CALGARY, AB T3H1L3 (403) 685-4070
(403) 685-4070


AMOOZEGAR NEZAM 7001 STRATHRIDGE GA SW CALGARY, AB T3H4R9 (403)
686-1116
(403) 686-1116


ANDA RICHARD 69 STRATHRIDGE CR SW CALGARY, AB T3H3R9 (403) 217-4776
(403) 217-4776


ANDERSON D B 80 STRADBROOKE RI SW CALGARY, AB T3H1T9 (403) 686-0702
(403) 686-0702


ANDERSON J 24 STRATHDALE CL SW CALGARY, AB T3H2K3 (403) 685-1424
(403) 685-1424
-----------------------------------------
 
J

Jim Berglund

How about if the city is entered through an input box as the string variable
"City"? Each user will be located in a different city, and each time a list
is being parsed, it will only contain records for that city.

Jim
 
J

Jim Berglund

....or what if we entered the city from an input box, determined the length
of the name and counted back from the comma?
Jim
 
R

Ron Rosenfeld

...or what if we entered the city from an input box, determined the length
of the name and counted back from the comma?
Jim

If you can control how data is input, then you don't need any of the
parsing routines. Merely use a user form to input the data into the
appropriate form fields.
 
J

Jim Berglund

Sorry, Ron. Let me go back. The application is to manage the impact of Do
Not Call Lists. The data changes daily. We typically work from reverse
phone directories, which list all the numbers in an area code. I I'm trying
to create a list of callable numbers along with whatever contact information
is available in the Reverse Phone Books.

The data in the RPB's is not consistent - but it is consistent enough that
the formula you created works for the data for Calgary. To be generically
useful, I need to be able to allow users to just input the city for their
Area Code. The rest of the data is available...

So, I can use a variable, or it will also work if we remove the reference to
the city - totally. Either will be fine.

I'd like to use a variable - if only to see how it's handled. But if it
can't be, would you please redo the formula and remove the city reference?

Thanks again
Jim
 
R

Ron Rosenfeld

Sorry, Ron. Let me go back. The application is to manage the impact of Do
Not Call Lists. The data changes daily. We typically work from reverse
phone directories, which list all the numbers in an area code. I I'm trying
to create a list of callable numbers along with whatever contact information
is available in the Reverse Phone Books.

The data in the RPB's is not consistent - but it is consistent enough that
the formula you created works for the data for Calgary. To be generically
useful, I need to be able to allow users to just input the city for their
Area Code. The rest of the data is available...

So, I can use a variable, or it will also work if we remove the reference to
the city - totally. Either will be fine.

I'd like to use a variable - if only to see how it's handled. But if it
can't be, would you please redo the formula and remove the city reference?

Thanks again
Jim

Well, you could input a city name and incorporate it into the pattern.

e.g.:

==================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long
Dim sCity As String

sCity = InputBox("Name of City: ")

Set rg = Range("A1") 'first cell in column
Set rg = Range(rg, Cells(Cells.Rows.Count, rg.Column).End(xlUp))
rg.Offset(0, 1).Resize(columnsize:=7).ClearContents

Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.ignorecase = True
myRegExp.Pattern = _
"^(\D+?)\s(\D*)\s*(.*)\s(" & sCity & "),?" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To myMatches(0).submatches.Count - 1
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End Sub
==================================

It would then only process those lines that had the city name
"Calgary".
 
J

Jim Berglund

Exactly! That's perfect.
Thanks so much for staying with me on this one, Ron. I can now substitute
"SAN FRANCISCO" and it works. I'm delighted.
Jim

Ron Rosenfeld said:
Sorry, Ron. Let me go back. The application is to manage the impact of Do
Not Call Lists. The data changes daily. We typically work from reverse
phone directories, which list all the numbers in an area code. I I'm
trying
to create a list of callable numbers along with whatever contact
information
is available in the Reverse Phone Books.

The data in the RPB's is not consistent - but it is consistent enough that
the formula you created works for the data for Calgary. To be generically
useful, I need to be able to allow users to just input the city for their
Area Code. The rest of the data is available...

So, I can use a variable, or it will also work if we remove the reference
to
the city - totally. Either will be fine.

I'd like to use a variable - if only to see how it's handled. But if it
can't be, would you please redo the formula and remove the city reference?

Thanks again
Jim

Well, you could input a city name and incorporate it into the pattern.

e.g.:

==================================
Option Explicit
Sub ParseAddr()
Dim myRegExp As Object, myMatches As Object
Dim rg As Range, c As Range
Dim i As Long
Dim sCity As String

sCity = InputBox("Name of City: ")

Set rg = Range("A1") 'first cell in column
Set rg = Range(rg, Cells(Cells.Rows.Count, rg.Column).End(xlUp))
rg.Offset(0, 1).Resize(columnsize:=7).ClearContents

Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.ignorecase = True
myRegExp.Pattern = _
"^(\D+?)\s(\D*)\s*(.*)\s(" & sCity & "),?" _
& "\s+([A-Z]{2})\s+(\w+)\s+(\(\d{3}\)\s+\d{3}-\d{4})"

For Each c In rg
If myRegExp.test(c.Text) = True Then
Set myMatches = myRegExp.Execute(c.Text)
For i = 0 To myMatches(0).submatches.Count - 1
c.Offset(0, i + 1) = myMatches(0).submatches(i)
Next i
End If
Next c
End Sub
==================================

It would then only process those lines that had the city name
"Calgary".
 
R

Ron Rosenfeld

Exactly! That's perfect.
Thanks so much for staying with me on this one, Ron. I can now substitute
"SAN FRANCISCO" and it works. I'm delighted.
Jim

Glad to help.

Thanks for the feedback.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top