I have a rather large spreadsheet of Names, Addresses etc. and I know
there are many duplicates. I would like to know how I can find and highlight
the duplicates in a column. I don't want to remove them, only highlight
them. if the duplicates could be highlighted in a color that would be great.
Please keep in mind that I just started with spreadsheets and I'm not that
familiar with the application. thanks
I do this regularly with small databases of names and addresses (up to
around 25000 items) from commercial companies that are supposedly
clean and de-duped.
Normally I'd start by spliting names into First Name and Last name
(Text->Columns), then I'd apply a sort for the fields I'd be
interested in checking. As most of the lists are UK based I'd normally
start by matching postcodes or zipcode as the postcode information in
the UK relates to small areas. If you have the full US code you can
use this, wider area codes are more difficult.
To match postcodes I'd personally sort the data by postcode then
create an additional column with if(A2=A1,1,"") which marks any rows
which are the same as the row above. I prefer this than using
conditional formating as it allows you to delete the mark from any
rows that you are happy are genuinely different. Also at the end of
checking you can do a Copy->Paste Special->Values to preserve the
marks and sort and delete the rows you didn't want.
You can apply the same to telephone numbers, but you usually have to
do some formating first (eg removing spaces and hyphens). And to email
addresses if you have them.
If there is a lot of overlap then you can do a double sort by postcode
then last name and make the marker match both on postcode and last
name, or you can build a score that measures the level of match - eg
last name, postcode and telephone match mark them at 3, just one of
these mark them at 1 etc.
Names and addresses without the postcode or other more specific data
are more difficult and so are the part I'd do last of all. Usually I'd
sort by last name first with first name/first initial second, then
something like town, possibly with the first character of the street
name. This depends on your data as to how far you'd push it.
Saul
www.notanant.com
communities of websites