Comparing text strings

  • Thread starter Comparing columns with text
  • Start date
C

Comparing columns with text

I have two columns populated with names and I need to compare them for
duplicates. Some duplicate names may contain middle initial or middle names
and some don't. So they may not be EXACT. Please give me suggestions on
formulas or funcions that would help me accomplish this task.
Thanks.
 
R

Ron Rosenfeld

I have two columns populated with names and I need to compare them for
duplicates. Some duplicate names may contain middle initial or middle names
and some don't. So they may not be EXACT. Please give me suggestions on
formulas or funcions that would help me accomplish this task.
Thanks.

This is an interesting problem, and not particularly easy to solve, unless you
can be very specific, and limiting, in the allowable variability.

For example, given the following:

Name = FirstName [MI or Middle Name] LastName

you could construct an algorithm that looks first for an exact match in
FirstName and LastName followed by a comparison of what is in between.

For example:

Name 1 matches in Name 2
No MI of MN anything
MI same MI or MN starting with MI or nothing
MN MI = left(MN,1) or same MN or nothing


Exactly how to construct this algorithm depends critically on how your data is
stored, and how you define "duplicate names"

On the other hand, if you are looking to answer the question, "do two different
names represent the same person", then a Soundex (or NYSIIS) method might be
better.

http://j-walk.com/ss/excel/tips/tip77.htm
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
--ron
 
S

sajay

really interestgin

one way is to
trim all names and sort colums and compare first few letters then
compare last names (last name with mid() function).

yours
sajay




"Comparing columns with text" <Comparing columns with
(e-mail address removed)> wrote in message
news:[email protected]...
 
R

Rick Rothstein

Maybe this formula?

=AND(LEFT(A1,FIND(" ",A1)-1)=LEFT(B1,FIND("
",B1)-1),TRIM(RIGHT(SUBSTITUTE(A1," ",REPT("
",99)),99))=TRIM(RIGHT(SUBSTITUTE(B1," ",REPT("
",99)),99)),IF(LEN(A1)-LEN(SUBSTITUTE(A1,"
",""))=LEN(B1)-LEN(SUBSTITUTE(B1," ","")),MID(A1,FIND(" ",A1&" ")+1,FIND("
",A1&" ",FIND(" ",A1&" ")+1)-FIND(" ",A1&" ")-1)=MID(B1,FIND(" ",B1&"
")+1,FIND(" ",B1&" ",FIND(" ",B1&" ")+1)-FIND(" ",B1&" ")-1),TRUE))

--
Rick (MVP - Excel)


Ron Rosenfeld said:
I have two columns populated with names and I need to compare them for
duplicates. Some duplicate names may contain middle initial or middle
names
and some don't. So they may not be EXACT. Please give me suggestions on
formulas or funcions that would help me accomplish this task.
Thanks.

This is an interesting problem, and not particularly easy to solve, unless
you
can be very specific, and limiting, in the allowable variability.

For example, given the following:

Name = FirstName [MI or Middle Name] LastName

you could construct an algorithm that looks first for an exact match in
FirstName and LastName followed by a comparison of what is in between.

For example:

Name 1 matches in Name 2
No MI of MN anything
MI same MI or MN starting with MI or nothing
MN MI = left(MN,1) or same MN or nothing


Exactly how to construct this algorithm depends critically on how your
data is
stored, and how you define "duplicate names"

On the other hand, if you are looking to answer the question, "do two
different
names represent the same person", then a Soundex (or NYSIIS) method might
be
better.

http://j-walk.com/ss/excel/tips/tip77.htm
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
--ron
 
R

Ron Rosenfeld

Maybe this formula?

=AND(LEFT(A1,FIND(" ",A1)-1)=LEFT(B1,FIND("
",B1)-1),TRIM(RIGHT(SUBSTITUTE(A1," ",REPT("
",99)),99))=TRIM(RIGHT(SUBSTITUTE(B1," ",REPT("
",99)),99)),IF(LEN(A1)-LEN(SUBSTITUTE(A1,"
",""))=LEN(B1)-LEN(SUBSTITUTE(B1," ","")),MID(A1,FIND(" ",A1&" ")+1,FIND("
",A1&" ",FIND(" ",A1&" ")+1)-FIND(" ",A1&" ")-1)=MID(B1,FIND(" ",B1&"
")+1,FIND(" ",B1&" ",FIND(" ",B1&" ")+1)-FIND(" ",B1&" ")-1),TRUE))

Maybe, but we don't have enough information yet.

One thought, based on only on looking quickly at your formula, is that it only
compares A1 & B1. My guess would be that we need to compare all of column B
with A1.
--ron
 
R

Rick Rothstein

Maybe this formula?
Maybe, but we don't have enough information yet.

One thought, based on only on looking quickly at your formula, is that it
only
compares A1 & B1. My guess would be that we need to compare all of column
B
with A1.

Ahh, yes, you might be right on that. That would probably require a macro
then, I would guess.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top