Testing for names that sounds alike

  • Thread starter Anthony Fontana
  • Start date
A

Anthony Fontana

When entering new clients, we sometimes do not get correct spelling of their
names. Is there a way to test for names that sound alike but not necessarily
spelled similarly, esp in the first few characters.

Thanks.
 
J

John Spencer MVP

Google Access groups on SOUNDEX code.

Here is one I put together long ago. You would have to compare soundex code
to soundex code and all that calculation could be slow.

WHERE fSoundex(SomeField) = fSoundex(NewValue) A

'=================== VBA code function follows =======================
Public Function fSoundex(strToEncode) As String
'AUTHOR: John Spencer
'LAST MODIFIED: June 30, 1999
'DESCRIPTION: Returns a string encoded as soundex code
'This version parallels the SOUNDEX used in MS SQL 6.5
'Procedure to encode string as soundex code using using the following rules
'Remove all w and h
'With exception of 1st character remove all aeiouy
'encode all letters in string
'collapse adjacent matching digits into one digit (3333 = 3)
'remove any zero values
'expand the code to 6 digits by adding zeroes to the end
'replace the first digit with the first letter of the original name
'KEEP first FOUR characters

Dim strSource As String, strEncode As String
Dim intPosition As Integer
Dim intLength As Integer
Dim strTEMP As String

On Error GoTo fSoundex_Error
'Get rid of leading & trailing spaces
strSource = Trim(strToEncode)

If Len(strSource) < 2 Then
strEncode = strSource & "000000"
Else
'Loop through remaining characters and encode them
For intPosition = 2 To Len(strSource)
Select Case Mid(strSource, intPosition, 1)
Case "b", "f", "p", "v" 'bfpv
strEncode = strEncode & "1"
Case "c", "g", "j", "k", "q", "s", "x", "z" 'cgjkqsxz
strEncode = strEncode & "2"
Case "d", "t" 'dt
strEncode = strEncode & "3"
Case "l" 'l
strEncode = strEncode & "4"
Case "m", "n" 'mn
strEncode = strEncode & "5"
Case "r" 'r
strEncode = strEncode & "6"
Case " " 'Space
strEncode = strEncode & "9"
Case Else
strEncode = strEncode & "0"
End Select
Next intPosition

If Len(strEncode) > 1 Then 'Remove adjacent duplicate codes
intLength = Len(strEncode)
For intPosition = intLength To 2 Step -1
If Mid(strEncode, intPosition - 1, 1) = _
Mid(strEncode, intPosition, 1) Then
strEncode = Mid(strEncode, 1, intPosition - 1) & _
Mid(strEncode, intPosition + 1)
End If
Next intPosition
End If


If Len(strEncode) > 1 Then 'REMOVE ZEROES
intLength = Len(strEncode)
For intPosition = 1 To intLength
If Mid(strEncode, intPosition, 1) <> "0" Then
strTEMP = strTEMP & Mid(strEncode, intPosition, 1)
End If
Next intPosition
strEncode = strTEMP
End If

strEncode = UCase(Left(strSource, 1)) & Mid(strEncode & "000000", 1, 5)

'if there is a space in the name then truncate at the space
If InStr(strEncode, "9") Then
strEncode = Left(strEncode, InStr(strEncode, "9") - 1) & "00000"
End If

End If 'Something is there

'Truncate value to 4 characters to conform with MS SQL 6.5 Soundex length
fSoundex = Mid(strEncode, 1, 4)
Exit Function

fSoundex_Error:
MsgBox Err.Description

End Function

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County
 
D

Danny J. Lesandrini

John:

It seems to me that the SoundEx function worked on names
and could say that Smyth is a lot like Smith. Can it, or your
function tell me that these two strings are alike?

ONC123 prot assign.pdf
ONC123_Protocol_Assignment.pdf

It's probably obvious what I am doing here. Got a directory
full of files that don't follow naming conventions. They are all
close, but spaces where underscores belong and abbrevs.

Assuming I replaced underscores with spaces before I did
the compare (or perhaps not), would your fSoundex function
work on this? If not, something else? Fuzzy compare Fn?
 
M

MGFoster

John said:
Google Access groups on SOUNDEX code.

Here is one I put together long ago. You would have to compare soundex
code to soundex code and all that calculation could be slow.

WHERE fSoundex(SomeField) = fSoundex(NewValue) A

'=================== VBA code function follows =======================
Public Function fSoundex(strToEncode) As String
'AUTHOR: John Spencer
'LAST MODIFIED: June 30, 1999
'DESCRIPTION: Returns a string encoded as soundex code
'This version parallels the SOUNDEX used in MS SQL 6.5
'Procedure to encode string as soundex code using using the following rules
'Remove all w and h
'With exception of 1st character remove all aeiouy
'encode all letters in string
'collapse adjacent matching digits into one digit (3333 = 3)
'remove any zero values
'expand the code to 6 digits by adding zeroes to the end
'replace the first digit with the first letter of the original name
'KEEP first FOUR characters

Dim strSource As String, strEncode As String
Dim intPosition As Integer
Dim intLength As Integer
Dim strTEMP As String

On Error GoTo fSoundex_Error
'Get rid of leading & trailing spaces
strSource = Trim(strToEncode)

If Len(strSource) < 2 Then
strEncode = strSource & "000000"
Else
'Loop through remaining characters and encode them
For intPosition = 2 To Len(strSource)
Select Case Mid(strSource, intPosition, 1)
Case "b", "f", "p", "v" 'bfpv
strEncode = strEncode & "1"
Case "c", "g", "j", "k", "q", "s", "x", "z" 'cgjkqsxz
strEncode = strEncode & "2"
Case "d", "t" 'dt
strEncode = strEncode & "3"
Case "l" 'l
strEncode = strEncode & "4"
Case "m", "n" 'mn
strEncode = strEncode & "5"
Case "r" 'r
strEncode = strEncode & "6"
Case " " 'Space
strEncode = strEncode & "9"
Case Else
strEncode = strEncode & "0"
End Select
Next intPosition

If Len(strEncode) > 1 Then 'Remove adjacent duplicate codes
intLength = Len(strEncode)
For intPosition = intLength To 2 Step -1
If Mid(strEncode, intPosition - 1, 1) = _
Mid(strEncode, intPosition, 1) Then
strEncode = Mid(strEncode, 1, intPosition - 1) & _
Mid(strEncode, intPosition + 1)
End If
Next intPosition
End If


If Len(strEncode) > 1 Then 'REMOVE ZEROES
intLength = Len(strEncode)
For intPosition = 1 To intLength
If Mid(strEncode, intPosition, 1) <> "0" Then
strTEMP = strTEMP & Mid(strEncode, intPosition, 1)
End If
Next intPosition
strEncode = strTEMP
End If

strEncode = UCase(Left(strSource, 1)) & Mid(strEncode & "000000",
1, 5)

'if there is a space in the name then truncate at the space
If InStr(strEncode, "9") Then
strEncode = Left(strEncode, InStr(strEncode, "9") - 1) & "00000"
End If

End If 'Something is there

'Truncate value to 4 characters to conform with MS SQL 6.5 Soundex
length
fSoundex = Mid(strEncode, 1, 4)
Exit Function

fSoundex_Error:
MsgBox Err.Description

End Function

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here's another Soundex function that uses all numbers. It is shorter
and may run faster....

================================ begin code ========================

Here's one 'Copyright (c) 1992 Ethan Winer. It's from a
book called something like "Basic Techniques and Utilities"
It results in an all numeric code instead of AlphaNumeric.

It makes for faster searches if your storing the soundex codes
and not just using the function as you search. Integers vs
strings.

It results in first letter matches that the alpha code does not.

("cane" = "kane")

That could be useful or annoying depending on your needs.

Static Function ISoundex(InWord As String) As Integer
Dim Word As String
Dim Work As String
Dim wkpos As Integer
Dim PrevCode As Integer
Dim L As Integer
Dim temp As String

Word = UCase$(InWord)
Work = "0000"
wkpos = 1
PrevCode = 0

For L = 1 To Len(Word)
temp = InStr("BFPVCGJKQSXZDTLMNR", Mid$(Word, L, 1))
If temp Then
temp = Asc(Mid$("111122222222334556", temp, 1))
If temp <> PrevCode Then
Mid$(Work, wkpos) = Chr$(temp)
PrevCode = temp
wkpos = wkpos + 1
If wkpos > 4 Then Exit For
End If
Else
PrevCode = 0
End If
Next

ISoundex = Val(Work)

End Function
============================== end code =============================

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
** Respond only to this newsgroup. I DO NOT respond to emails **

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBSdTyLIechKqOuFEgEQJw6QCaAt6tDyiVBVEGQ9UZr+UMP9l7f6MAn1D1
rqBmXrbjtWpMkc7oPJ43kuvN
=vjW/
-----END PGP SIGNATURE-----
 
J

John Spencer MVP

No SOUNDEX is designed for matching names and even there it is not highly
accurate.

Matching something like that is going to be VERY difficult. I would look at
creating a custom vba function to try to regularize the names.

-- replace underscores with spaces
-- break the name down into pieces and keep the first 4 char of each word.
-- Keep the extension

Something like the following UNTESTED function

Public Function regularName(strIN, Optional iLen As Long = 4) As String
Dim StrOut As String
Dim vWords As Variant
Dim I As Long

If Len(Trim(strIN & "")) = 0 Then
regularName = strIN
Else

strIN = Replace(strIN, "_", " ")
vWords = Split(strIN, " ")
For I = 0 To UBound(vWords)
If Len(vWords(I)) > 0 Then
If InStr(vWords(I), ".") > 0 Then
StrOut = StrOut & " " & _
Left(Left(vWords(I), Len(vWords(I)) - 4), iLen) & _
Right(strIN, 4)
Else
StrOut = StrOut & " " & Left(vWords(I), iLen)
End If
End If
Next I
regularName = Mid(StrOut, 2)

End If

End Function

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County
 
A

Anthony Fontana

Thanks. This gives me a lot to work with. I realize that "sounds alike" is
a vague term too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top