T
Tyro
Agreed. I've been massaging computer data for over 40 years. In this case,
it's best to go back to square 1.
it's best to go back to square 1.
It seems like this ought to be easy, but I'm stumped.
I received several thousand address records in the following format--all in
one column of an Excel worksheet:
Smith, John & Mary 123 Main St Chicago, IL 60601
Williams, Bill & Cindy 45 54th Drive Madison, WI 51375
I need to parse these into:
Last Name
First Name(s)
Street Address
City
State
Zip
I can get the last name using =LEFT(B2, FIND(",",B2)-1) and the Zip using
=RIGHT(B2,5) but I can't come up with a way to get the other fields. A quick
scan of the records shows all the street addresses seem to start with a
number, and there are no numbers in any of the first names, so it seems like
if I can find the first number reading from the left, I could grab the first
names, but neither FIND nor SEARCH seem to work. I'm using Excel 2007.
Could anyone point me to a possible approach. Seems as if this is something
others must have solved many times before.
Many thanks,
Tom K
If your layout is always in the same format, and in particular if there are
only single first names, then this can be done using Regular Expressions.
The main problem is to differentiate the first name(s) from the beginning of
the address. The rule that I used in the example below is this pattern:
<comma><space>
<word>
then, optionally:
<space>
ampersand (&)
<space>
<word>
But this will fail if you have a "double" first name. For example:
Smith, Mary Jane 47 Edgecomb rd. ...
If that is the case, we would need to come up with a different rule. But try
this, anyway, and see if it works.
If your addresses always start with a number, we could incorporate that as a
rule and get around the issue of double first names.
Also, you need to list your cities by name. In the Pattern I devised, I listed
Chicago and Madison. You can add more possibilities by extending the
pipe-delimited list in Pattern.
First, enter the UDF into your workbook:
<alt-F11> opens the VB Editor
Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.
The following formulas should work:
Last Name: =remid(A1,"^[^,]*")
First Name(s): =remid(A1,Pattern,1)
Street Address: =remid(A1,Pattern,3)
City: =remid(A1,Pattern,4)
State: =remid(A1,Pattern,5)
Zip: =remid(A1,Pattern,6)
Pattern:
,\s+(\w+(\s+&\s+\w+)?)\s+(.*)\s+(Chicago|Madison),\s+([A-Z]{2})\s+(\d{5}(-\d{4})?)
=============================================
Option Explicit
Function reMid(str As String, spattern As String, Optional Index) As String
Dim re As Object
Dim mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.ignorecase = True
re.Pattern = spattern
If re.test(str) = True Then
Set mc = re.Execute(str)
If IsMissing(Index) Then
reMid = mc(0)
Else
reMid = mc(0).submatches(Index - 1)
End If
End If
End Function
=============================================
This should handle any entries that are in the pattern you presented. Examples
that don't work would be useful in trying to refine this routine.
--ron
Martin just pointed out to me that EVERY street address begins with a number.
That being the case, the following is more efficient for picking out the first
name(s), and will handle the issue of two first names: e.g. Mary Jane & Justin
Last Name: =remid(A1,"^[^,]*")
First Name(s): =remid(A1,Pattern,1)
Street Address: =remid(A1,Pattern,2)
City: =remid(A1,Pattern,3)
State: =remid(A1,Pattern,4)
Zip: =remid(A1,Pattern,5)
and
Pattern:
,\s+(\D+)\s+(.*)\s+(Chicago|Madison),\s+([A-Z]{2})\s+(\d{5}(-\d{4})?)
We still need to generate a pipe-delimited list of cities. How to do that
depends on the length of the list.
--ron
Guys:
Thank you all for the many excellent suggestions. I'm grateful that so many
people would try to help out. What a great community!
After carefully scanning thorugh the list, it looks like there are two
instances where the street address does not start with a number (darn) --
those address that start PO Box XXX, or RR# XXX. Most of the first names are
of the format "Spouse 1 & Spouse 2" or just "Spouse 1".
Let me work throught the suggestions from Ron, Martin, and Earl and see if I
have any success. Again many thanks.
Tom K
Guys:
Thank you all for the many excellent suggestions. I'm grateful that so many
people would try to help out. What a great community!
After carefully scanning thorugh the list, it looks like there are two
instances where the street address does not start with a number (darn) --
those address that start PO Box XXX, or RR# XXX. Most of the first names are
of the format "Spouse 1 & Spouse 2" or just "Spouse 1".
Let me work throught the suggestions from Ron, Martin, and Earl and see if I
have any success. Again many thanks.
Tom K
P.S. What does "OP" mean?
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.