Marking Duplicates on a List

R

Rothrock

I've got a list of several thousand names and addresses. I'm trying to
find the duplicates. I've searched the forums here and found this:

=IF(COUNTIF(Range1, FirstCellofRange)>1,"Duplicate","")

And it seems to almost be what I want. But...the first and last name
have been split across two columns. So additionally I would like to
have a row marked as a possible dupe if a variety of things are met.

If cellB + cellC in this row is the same as any other cellB + cellC in
any other row, mark as a possible dupe.

or

If cellJ in this row is the same as any other rows cellJ, mark as a
possible dupe.

Any pointers or other approaches would be great. Thanks.
 
C

Christopher MacLeod

Assuming you've sorted your file by last (column "B") and first names
(column "C") and assuming row 1 is a header row, then a formula such as this
placed in a separate column on row 2 should do the trick:

=IF(AND(B2=B1,C2=C1),"Duplicate","")

Copy the formula down through your entire list. Be aware that accuracy
depends on your sorting your list so that duplicates are clustered. In other
words, this formula won't find duplicates in "any other row," only in the
rows immediately above it.
 
J

JE McGimpsey

Rothrock said:
I've got a list of several thousand names and addresses. I'm trying to
find the duplicates. I've searched the forums here and found this:

=IF(COUNTIF(Range1, FirstCellofRange)>1,"Duplicate","")

And it seems to almost be what I want. But...the first and last name
have been split across two columns. So additionally I would like to
have a row marked as a possible dupe if a variety of things are met.

If cellB + cellC in this row is the same as any other cellB + cellC in
any other row, mark as a possible dupe.

or

If cellJ in this row is the same as any other rows cellJ, mark as a
possible dupe.

Any pointers or other approaches would be great. Thanks.

Take a look here:

http://cpearson.com/excel/duplicat.htm
 
R

Rothrock

Thanks, but that is the link were I found the formula I listed. It
doesn't seem to include how to combine the columns together as well as
take the or into account. Maybe I just don't know how to work the
COUNTIF part? I also realize I could make a new column that adds the
two together, hide it, and use that page's suggestions. I was hoping
for something a little simpler.

I will take another look and maybe I'll understand a bit better this
time.
 
R

Rothrock

Thanks. That is an interesting approach. I will give it a try and see
how that works out.
 
J

JE McGimpsey

Rothrock said:
Thanks, but that is the link were I found the formula I listed. It
doesn't seem to include how to combine the columns together as well as
take the or into account. Maybe I just don't know how to work the
COUNTIF part? I also realize I could make a new column that adds the
two together, hide it, and use that page's suggestions. I was hoping
for something a little simpler.

I will take another look and maybe I'll understand a bit better this
time.

OK - I thought that
or

If cellJ in this row is the same as any other rows cellJ, mark as a
possible dupe.

would be sufficient.

To mark dupes of more than one cell, try:

This is not necessarily the most efficient way, but it works:

=IF(SUMPRODUCT(--(A1=$A$1:$A$100),--(B1=$B$1:$B$100))>1,"Dupe", "")

Copy down as far as required.
 
H

Helpful Harry

Christopher MacLeod said:
Assuming you've sorted your file by last (column "B") and first names
(column "C") and assuming row 1 is a header row, then a formula such as this
placed in a separate column on row 2 should do the trick:

=IF(AND(B2=B1,C2=C1),"Duplicate","")

Copy the formula down through your entire list. Be aware that accuracy
depends on your sorting your list so that duplicates are clustered. In other
words, this formula won't find duplicates in "any other row," only in the
rows immediately above it.

The other problem is that it won't, and can't possibly, find duplicates
with typos ("Toni Jones", "Rony Jones" or "Tony Kones" instead of "Tony
Jones") or where nicknames are used (eg. "Tony Jones" and "Anthony
Jones").

The only real way to find duplicates is by manually reading through the
list.

Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)
 
P

Paul Berkowitz

Thanks. That is an interesting approach. I will give it a try and see
how that works out.

You're not quoting any of the message you're replying to, so there's no way
for anyone else to know which approach you found interesting. It might help
others reading this thread if you made it clearer.

--
Paul Berkowitz
MVP MacOffice
Entourage FAQ Page: <http://www.entourage.mvps.org/faq/index.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>

Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.

PLEASE always state which version of Microsoft Office you are using -
**2004**, X or 2001. It's often impossible to answer your questions
otherwise.
 
R

Rothrock

Thanks all y'all.

I was hoping the reply link was attached to the thread to which I was
replying, but evidently not.

The COUNTIF reply was to the absolutely amazing JE McGimpsey's post.

The interesting idea was for Christopher MacLeod. (Most likely also
absolutely amazing I just don't have previous experience with him!)

Now for Helpful Harry. That is why I want to be able to add two columns
and also check things like phone number and stuff. Eventually I want to
kind of have developed a check list - 3 out of 5 and you are most
likely a dupe, 1 out of 5 maybe not, etc. That is also why I just want
to mark them - not automatically delete them.

Finally I've tried JE McGimpsey's latest suggestion:

=IF(SUMPRODUCT(--(A1=$A$1:$A$100),--(B1=$B$1:$B$100))>1,"Dupe", "")


I take it that the $A$1:$A$100 should represent the whole column? Cause
after row 100 it stopped working. Also, I looked up SUMPRODUCT and it
is absolutely lovely that it "multiplies corresponding numeric
components...and returns the sum of those products." But what does that
mean?

Thanks to everybody. I think that between all these different ideas I
will be able to cobble together what I need.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top