Find Unique Records; Must Use Fuzzy Logic

ryguy7272 · Oct 2, 2007

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from BCM
in Outlook. I am contemplating a few different ways of doing this. I found
a free utility on www.downloads.com that allows a user to do a â€˜complex
filterâ€™ to eliminate duplicates, but itâ€™s not really giving me the results I
am looking for. I sorted by Company (Column F) and then by last name (Column
D) and then by first name (Column B). I am now wondering if there is a way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide the
remainder of the rows. The issue is that Excel identifies lots of â€˜uniqueâ€™
records because it identifies two people with two different office addresses
as two different records, but for our purposes this is one contact.
Similarly, a contactâ€™s name could be spelled Freuh, in the personal contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two â€˜uniqueâ€™ records, but again, for
our purposes this is one contact. All company names are listed in Column F,
all last names are in Column D, and all first names are in Column B. I would
like to copy/paste all data on all rows with unique records (F, D, & B) to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able to
handle this?

Thanks, as always!
Ryan---

Tom Ogilvy · Oct 2, 2007

While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes". then
maybe someone will have some ideas.

Zone · Oct 2, 2007

Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub to
run it. It returns a number from 0 to 1, depending on how similar it thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) > 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) > 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k > Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k > Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function

Tom Ogilvy · Oct 2, 2007

MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

Zone · Oct 2, 2007

Tom, so you test-drove it, huh? Four matching letters out of five would be
80%, so that seems reasonable. James

ryguy7272 · Oct 2, 2007

Thanks Tom, and thanks Zone, sweet function. However, I think I need to do
this thing (somewhat the old fashioned way). I have had ASAP utilities for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this valuable
utility:
http://www.asap-utilities.com/download-asap-utilities.php

Regards,
Ryan--

Zone · Oct 2, 2007

You're welcome. Good luck with that. James

ryguy7272 said:
Thanks Tom, and thanks Zone, sweet function. However, I think I need to
do
this thing (somewhat the old fashioned way). I have had ASAP utilities
for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this
valuable
utility:
http://www.asap-utilities.com/download-asap-utilities.php

Regards,
Ryan--

ryguy7272 · Oct 2, 2007

One more thing...for anyone who stumbles upon this post sometime in the future.
I used this utility (below), along with the other one (above), and got
pretty darn close to where I wanted to be.

http://members.iinet.net.au/~brettdj/#Partial_Column_Match

Regards,
Ryan--

Find/Replace Macro; Need Loop	7	Jul 6, 2009
Combo Boxes as data source for a grapgh	2	Jun 30, 2008
Hiding columns based on value	9	Apr 12, 2008
Search other sheets for numbers & color them	10	Mar 28, 2014
Unique records excel	2	Jun 25, 2008
Cond. Form. Adding In dates & out of dates	0	Aug 5, 2009
Strip out certain rows that have merged cells	2	May 1, 2013
Find Specific Rows in Range and Create New Tab	18	Oct 24, 2013

Find Unique Records; Must Use Fuzzy Logic

ryguy7272

Tom Ogilvy

Zone

Tom Ogilvy

Zone

ryguy7272

Zone

ryguy7272

Ask a Question

Similar Threads