Find Unique Records; Must Use Fuzzy Logic

R

ryguy7272

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from BCM
in Outlook. I am contemplating a few different ways of doing this. I found
a free utility on www.downloads.com that allows a user to do a ‘complex
filter’ to eliminate duplicates, but it’s not really giving me the results I
am looking for. I sorted by Company (Column F) and then by last name (Column
D) and then by first name (Column B). I am now wondering if there is a way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide the
remainder of the rows. The issue is that Excel identifies lots of ‘unique’
records because it identifies two people with two different office addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact’s name could be spelled Freuh, in the personal contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two ‘unique’ records, but again, for
our purposes this is one contact. All company names are listed in Column F,
all last names are in Column D, and all first names are in Column B. I would
like to copy/paste all data on all rows with unique records (F, D, & B) to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able to
handle this?

Thanks, as always!
Ryan---
 
T

Tom Ogilvy

While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes". then
maybe someone will have some ideas.
 
Z

Zone

Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub to
run it. It returns a number from 0 to 1, depending on how similar it thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) > 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) > 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k > Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k > Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function
 
Z

Zone

Tom, so you test-drove it, huh? Four matching letters out of five would be
80%, so that seems reasonable. James
 
R

ryguy7272

Thanks Tom, and thanks Zone, sweet function. However, I think I need to do
this thing (somewhat the old fashioned way). I have had ASAP utilities for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this valuable
utility:
http://www.asap-utilities.com/download-asap-utilities.php


Regards,
Ryan--
 
Z

Zone

You're welcome. Good luck with that. James
ryguy7272 said:
Thanks Tom, and thanks Zone, sweet function. However, I think I need to
do
this thing (somewhat the old fashioned way). I have had ASAP utilities
for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this
valuable
utility:
http://www.asap-utilities.com/download-asap-utilities.php


Regards,
Ryan--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top