comparing data

S

Sandy

I have an old A2k db I am trying to normalize, and there are many instances
where the same company has been entered multiple times, but typed differently
enough to allow the multpile entries (like "Company"; "The Company"; "The
Company Inc" etc.).

Is there a function that would allow me to compare text data entries that
are similar to one another so I can reconcile them into one entry? Kinda
like a Find Dups query but obviously these aren't exact dups...

TIA -- Any help is greatly appreciated :)
 
D

Douglas J. Steele

There really isn't any easy way of doing what you're trying to do. As fellow
MVP John Vinson likes to say, you need USB: Use Somebody's Brain.
 
S

Sandy

I thought that's what I was trying to do here :)

Guess I will have to keep trudging thru using my own.

Thanks again!
 
D

Dale Fye

Sandy,

I use a function to give me a comparison of two text values. This is far
from perfect, but is a start. Basically, the function receives two text
strings and compares each segment of string1 to each segment of string2.
Rather than trying for an exact match, I actually search to see if the the
elements of string2 are contained within the elements of string1. Then I
reverse the process and check to see if the elements of string1 are in
string2. The function returns a value from 0 to 1 indicating the degree of
similarity.

HTH
Dale

You could use this something like:

SELECT T1.CompanyID, T1.CompanyName, T2.CompanyID, T2.CompanyName,
fnSimilar(T1.CompanyName, T2.CompanyName)
FROM yourTable T1, yourTable T2
WHERE fnSimilar(T1.CompanyName, T2.CompanyName) > 0
ORDER BY fnSimilar(T1.CompanyName, T2.CompanyName) DESC

Public Function fnSimilar(Text1 As Variant, Text2 As Variant) As Single

Dim aText1() As String, aText2() As String
Dim intLoop1 As Integer, intLoop2 As Integer
Dim intWordMatch1 As Integer, intWordMatch2 As Integer

If IsNull(Text1) Or IsNull(Text2) Then
fnSimilar = 0
Exit Function
End If

aText1 = Split(Text1, " ")
aText2 = Split(Text2, " ")

For intLoop1 = LBound(aText1) To UBound(aText1)
For intLoop2 = LBound(aText2) To UBound(aText2)
If InStr(aText1(intLoop1), aText2(intLoop2)) > 0 Then
intWordMatch1 = intWordMatch1 + 1
Exit For
End If
Next intLoop2
Next intLoop1

For intLoop2 = LBound(aText2) To UBound(aText2)
For intLoop1 = LBound(aText1) To UBound(aText1)
If InStr(aText2(intLoop2), aText1(intLoop1)) > 0 Then
intWordMatch2 = intWordMatch2 + 1
Exit For
End If
Next intLoop1
Next intLoop2

fnSimilar = (intWordMatch1 + intWordMatch2) / ((UBound(aText1) -
LBound(aText1) + 1) + (UBound(aText2) - LBound(aText2) + 1))

End Function
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top