using duplicate function

R

RB

hello
I have a huge list of names; some of which are duplicates. However, its difficult to use the duplicate conditional formatting because there are spelling errors.

For eg. if the list consists of fruits and vegetables:

Apples
Apple
Appel

While they need to be grouped under Apple- duplicate doesnt work on this. Any idea on how to fix this problem?

Thanks
 
C

Claus Busch

Hi,

Am Mon, 6 May 2013 07:36:08 -0700 (PDT) schrieb RB:
Apples
Apple
Appel

look for the longest corresponding part. In your case with the Apples:
"App".
Conditional Formatting => Formula:
=ISNUMBER(SEARCH("App",A1))


Regards
Claus Busch
 
R

RB

hello

I have a huge list of names; some of which are duplicates. However, its difficult to use the duplicate conditional formatting because there are spelling errors.



For eg. if the list consists of fruits and vegetables:



Apples

Apple

Appel



While they need to be grouped under Apple- duplicate doesnt work on this. Any idea on how to fix this problem?



Thanks

Thanks Claus!!!
 
C

Claus Busch

Hi again,

Am Mon, 6 May 2013 07:36:08 -0700 (PDT) schrieb RB:
Apples
Apple
Appel

you can also try:
Find & Select => Replace => Find what="App*" => Replace with "Apples"
Then the spellings for apples are equal.


Regards
Claus Busch
 
R

RB

Hi again,



Am Mon, 6 May 2013 07:36:08 -0700 (PDT) schrieb RB:






you can also try:

Find & Select => Replace => Find what="App*" => Replace with "Apples"

Then the spellings for apples are equal.





Regards

Claus Busch

--

Win XP PRof SP2 / Vista Ultimate SP2

Office 2003 SP2 /2007 Ultimate SP2

Hi Claus,

thanks for your post- however, turns out that this really doesnt help me; in this case, I know what I have to look for.
But what if the list has different names and the spelling is off for some and not for the others.

Eg.
Banana, BANANAS, Apples, APPLELS, Apple, Plums, Plums, Orange, Oranges etc,
In this case: the find and replace wont work especially because there are different names and they are all different.
I need my list to read as : Banana, Apples, Plums, Orange

thanks,
 
C

Claus Busch

Hi,

Am Mon, 6 May 2013 08:49:46 -0700 (PDT) schrieb RB:
But what if the list has different names and the spelling is off for some and not for the others.

Eg.
Banana, BANANAS, Apples, APPLELS, Apple, Plums, Plums, Orange, Oranges etc,
In this case: the find and replace wont work especially because there are different names and they are all different.
I need my list to read as : Banana, Apples, Plums, Orange

you have to do it especially for each name. It is a lot of work, but
then your table is correct.
You can do it also with VBA. Loop through all your names, find and
replace them.


Regards
Claus Busch
 
R

Ron Rosenfeld

hello
I have a huge list of names; some of which are duplicates. However, its difficult to use the duplicate conditional formatting because there are spelling errors.

For eg. if the list consists of fruits and vegetables:

Apples
Apple
Appel

While they need to be grouped under Apple- duplicate doesnt work on this. Any idea on how to fix this problem?

Thanks

You might be able to compute the Levenshtein distance between the words and, if there are only a few letters different, assume they are the same. The macro below assumes that if the distance is one or two letters, then the words are the same, so it will only return the first one. You'll have to test this, and see whether two is appropriate. It does work on your limited sample.

The macro below assumes the list of fruits/vegetables is in column A, and will put the results into the adjacent column.
As written, the results are all capitalized.

If this idea works on your data, the capitalization can be changed; the results column can be changed; and, if necessary, the routine can be sped up considerably.


To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro by name, and <RUN>.

==================================
Option Explicit
Sub UniqueSimilars()
Dim Rg As Range, c As Range
Dim rRes As Range
Dim col As Collection
Dim i As Long, j As Long, k As Long
Dim v() As Variant

Set Rg = Range("A1", Cells(Rows.Count, "A").End(xlUp))
Set col = New Collection
On Error Resume Next
For Each c In Rg
col.Add Item:=UCase(c.Text), Key:=c.Text
Next c
On Error GoTo 0

ReDim v(1 To col.Count)
For i = 1 To col.Count
For j = LBound(v) To UBound(v)
k = LD(col(i), v(j))
If k <= 2 Then Exit For
Next j
If k > 2 Then v(i) = col(i)
Next i

j = 1
Set rRes = Rg(1, 1).Offset(0, 1)
rRes.EntireColumn.Clear
For i = LBound(v) To UBound(v)
If Len(v(i)) > 0 Then
rRes(j, 1) = v(i)
j = j + 1
End If
Next i

End Sub

'********************************
'*** Compute Levenshtein Distance
'********************************
'http://www.merriampark.com/ld.htm#VB

Private Function LD(ByVal s As String, ByVal t As String) As Long
Dim d() As Long ' matrix
Dim m As Long ' length of t
Dim n As Long ' length of s
Dim i As Long ' iterates through s
Dim j As Long ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Long ' cost

' Step 1
n = Len(s)
m = Len(t)
If n = 0 Then
LD = m
Exit Function
End If
If m = 0 Then
LD = n
Exit Function
End If
ReDim d(0 To n, 0 To m) As Long

' Step 2
For i = 0 To n
d(i, 0) = i
Next i

For j = 0 To m
d(0, j) = j
Next j

' Step 3
For i = 1 To n
s_i = Mid$(s, i, 1)

' Step 4
For j = 1 To m
t_j = Mid$(t, j, 1)

' Step 5
If s_i = t_j Then
cost = 0
Else
cost = 1
End If

' Step 6
d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)

Next j
Next i

' Step 7
LD = d(n, m)
Erase d
End Function

'*******************************
'*** Get minimum of three values
'*******************************

Private Function Minimum(ByVal a As Long, _
ByVal b As Long, _
ByVal c As Long) As Long
Dim mi As Long

mi = a
If b < mi Then
mi = b
End If
If c < mi Then
mi = c
End If

Minimum = mi

End Function
=====================================================
 
W

Walter Briscoe

In message <[email protected]> of Mon, 6 May
2013 20:51:59 in microsoft.public.excel.worksheet.functions, Ron
Rosenfeld said:
You might be able to compute the Levenshtein distance between the words
and, if there are only a few letters different, assume they are the
same. The macro below assumes that if the distance is one or two
letters, then the words are the same, so it will only return the first
one. You'll have to test this, and see whether two is appropriate. It
does work on your limited sample.

Thanks for an interesting article. I had not heard of "Levenshtein
distance". <http://en.wikipedia.org/wiki/Levenshtein_distance>
describes it. My eyes glazed, given the mathematical definition.

I note
'*******************************
'*** Get minimum of three values
'*******************************

Private Function Minimum(ByVal a As Long, _
ByVal b As Long, _
ByVal c As Long) As Long
Dim mi As Long

mi = a
If b < mi Then
mi = b
End If
If c < mi Then
mi = c
End If

Minimum = mi

End Function

Why not use Application.Min(a, b, c)?

As you only use it once, rather than
d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j -
1) + cost)
I would have used
d(i, j) = Application.Min(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)

I am sure your reasoning is better than mine. ;)
 
R

Ron Rosenfeld

I would have used
d(i, j) = Application.Min(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)

I am sure your reasoning is better than mine. ;)

My "reasoning" is poor.

At the time, I was looking just at implementing the concept. The code is a copy of a VB routine that I found at the site noted in the code (which now appears to be non-existent). I never spent any time trying to optimize the code and I suspect that the original coder wrote this in VB, and not VBA, and used ony functions that were available in VB.

However, you provoked me to test the timing on this and, it turns out, at least on the very small subset of 3 numbers I tested, the VB implementation runs much faster than does the Application.Min function.
 
W

Walter Briscoe

In message <[email protected]> of Tue, 7 May
2013 06:07:11 in microsoft.public.excel.worksheet.functions, Ron
Rosenfeld said:
My "reasoning" is poor.

At the time, I was looking just at implementing the concept. The code
is a copy of a VB routine that I found at the site noted in the code
(which now appears to be non-existent). I never spent any time trying
to optimize the code and I suspect that the original coder wrote this
in VB, and not VBA, and used ony functions that were available in VB.

However, you provoked me to test the timing on this and, it turns out,
at least on the very small subset of 3 numbers I tested, the VB
implementation runs much faster than does the Application.Min function.

That is interesting - mainly from the ability to time small pieces of
code. How do you do it? I am guessing you get the difference in time
between running the 2 pieces of code a large number of times and infer
that division gets a reasonable approximation to the time for a single
call. That inference might be false. Optimisers can confuse. ;)

I rarely worry about the time taken to run small pieces of code.
My code usually gets data from the Internet and is IO-bound.
Each round trip to the net takes of the order of a second.
I have yet to work out how to run several transactions in parallel. ;(
 
R

Ron Rosenfeld

That is interesting - mainly from the ability to time small pieces of
code. How do you do it? I am guessing you get the difference in time
between running the 2 pieces of code a large number of times and infer
that division gets a reasonable approximation to the time for a single
call. That inference might be false. Optimisers can confuse. ;)

I use the high precision event timer (HPET). And I got similar results running just a single instance as well as 100 iterations.
I rarely worry about the time taken to run small pieces of code.
My code usually gets data from the Internet and is IO-bound.
Each round trip to the net takes of the order of a second.
I have yet to work out how to run several transactions in parallel.

I rarely worry about it, but I have had some projects which process hundreds of thousands of rows, so speeding up even short segments can have a significant impact.
 
G

GS

I think this is the same algorithm...

Private Function LD&(ByVal s$, ByVal t$)
Dim d()&,m&,i&,n&,i&,j&,cost&
Dim s_i$,t_j$

If (Not Len(s)) or (Not Len(t)) Then LD = 0: Exit Function

'Step1
n = Len(s): m = Len(t): ReDim d(0 To n, 0 To m) As Long

'Step2
For i = 0 To n: d(i, 0) = i: Next 'i
For j = 0 To m: d(0, j) = j: Next 'j

For i = 1 To n
s_i = Mid$(s, i, 1) '//step3

For j = 1 To m
t_j = Mid$(t, j, 1) '//step4

If s_i = t_j Then cost = 0 Else cost = 1 '//step5

d(i, j) = Minimum(d(i - 1, j) + 1, _
d(i, j - 1) + 1, _
d(i - 1, j - 1) + cost) '//step6
Next 'j
Next 'i

'Step7
LD = d(n, m): Erase d
End Function

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
R

Ron Rosenfeld

I think this is the same algorithm...

Private Function LD&(ByVal s$, ByVal t$)
Dim d()&,m&,i&,n&,i&,j&,cost&
Dim s_i$,t_j$

If (Not Len(s)) or (Not Len(t)) Then LD = 0: Exit Function

'Step1
n = Len(s): m = Len(t): ReDim d(0 To n, 0 To m) As Long

'Step2
For i = 0 To n: d(i, 0) = i: Next 'i
For j = 0 To m: d(0, j) = j: Next 'j

For i = 1 To n
s_i = Mid$(s, i, 1) '//step3

For j = 1 To m
t_j = Mid$(t, j, 1) '//step4

If s_i = t_j Then cost = 0 Else cost = 1 '//step5

d(i, j) = Minimum(d(i - 1, j) + 1, _
d(i, j - 1) + 1, _
d(i - 1, j - 1) + cost) '//step6
Next 'j
Next 'i

'Step7
LD = d(n, m): Erase d
End Function

It seems to be exactly the same algorithm, using the same variables, but (to me) making things a bit more obscure by using type declaration characters instead of naming the types, and by combining multiple lines into a single line. Others may prefer this type of presentation.

Also, I don't see the Minimum function in what you've posted.
 
G

GS

It seems to be exactly the same algorithm, using the same variables,
but (to me) making things a bit more obscure by using type
declaration characters instead of naming the types, and by combining
multiple lines into a single line. Others may prefer this type of
presentation.

Also, I don't see the Minimum function in what you've posted.

Yeah, that was how I copied it from the original webpage into my text
editor. I used the type symbols to shorten things. I got used to
reading these in older VB scripts and so I adopted using them for
common types instead of typing long hand!

I was searching for where I had put this in a code module after
revising it to do Min or Max. It didn't seem to work as expected for
some reason and so I was hoping your link would shed some light but my
browser said the page couldn't be found.

Here's the revised code I found...

Function LD&(s$, t$, Optional MinOrMax&)
Dim d() As Long, m&, n&, i&, j&, cost&
Dim s_i$, t_j$

If (Len(s) = 0) Or (Len(t) = 0) Then LD = 0: Exit Function

'Initialize array
n = Len(s): m = Len(t): ReDim d(0 To n, 0 To m)
For i = 0 To n: d(i, 0) = i: Next 'i
For j = 0 To m: d(0, j) = j: Next 'j

'Load array with min values
For i = 1 To n
s_i = Mid$(s, i, 1)

For j = 1 To m
t_j = Mid$(t, j, 1)

If s_i = t_j Then cost = 0 Else cost = 1

d(i, j) = Get_MinMax(d(i - 1, j) + 1, _
d(i, j - 1) + 1, _
d(i - 1, j - 1) + cost, MinOrMax)
Next 'j
Next 'i
LD = d(n, m): Erase d
End Function

Private Function Get_MinMax&(ByVal a&, ByVal b&, ByVal c&, Optional
Spec&)
Dim mi&
mi = a
If Spec = 0 Then
If b < mi Then mi = b
If c < mi Then mi = c
Else
If b > mi Then mi = b
If c > mi Then mi = c
End If 'Min = 0
Get_MinMax = mi
End Function

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
G

GS

I noticed arg defs and a var name did not get changed...

Private Function Get_MinMax&(a&, b&, c&, Optional Spec&)
Dim mi&
mi = a
If Spec = 0 Then
If b < mi Then mi = b
If c < mi Then mi = c
Else
If b > mi Then mi = b
If c > mi Then mi = c

End If 'Spec = 0
Get_MinMax = mi
End Function

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
R

Ron Rosenfeld

I used the type symbols to shorten things. I got used to
reading these in older VB scripts and so I adopted using them for
common types instead of typing long hand!

I thought I recalled that you liked using the type symbols.

It didn't seem to work as expected for
some reason and so I was hoping your link would shed some light but my
browser said the page couldn't be found.

Here's the revised code I found...

This revised code seems to function OK with (very) limited testing. Where does it give unexpected results for you?
 
G

GS

I thought I recalled that you liked using the type symbols.

Yes, as I said I adopted them due to reading lots of VB code that had
them. Once one gets used to them there's no problem identifying type
since I still prefix vars with lowercase type chars...

Dim lLastRow&, sFilename$, iPos%, dlTime#, slTime!

... As Long, String, Integer, Double, Single respectively.
This revised code seems to function OK with (very) limited testing.
Where does it give unexpected results for you?

It doesn't return Min correctly, IMO, when testing via the Immediate
Window. I never used this in a project, ..just thought it was an
interesting function to have available. It would help if I had the
benefit of reading that webpage so I could better understand how to use
it. What test data did you process to test how it works?

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
R

Ron Rosenfeld

It doesn't return Min correctly, IMO, when testing via the Immediate
Window. I never used this in a project, ..just thought it was an
interesting function to have available. It would help if I had the
benefit of reading that webpage so I could better understand how to use
it. What test data did you process to test how it works?

I just tested the LD function. Not the MinMax function.
 
G

GS

I just tested the LD function. Not the MinMax function.

I was able to find some good documentation online and so I now have a
fresh understanding of how this function works AND how it should be
used!<g> In retrospect, IMO, it has nothing to do with
WorksheetFunctions Min/Max!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion
 
R

RB

You might be able to compute the Levenshtein distance between the words and, if there are only a few letters different, assume they are the same. The macro below assumes that if the distance is one or two letters, then thewords are the same, so it will only return the first one. You'll have to test this, and see whether two is appropriate. It does work on your limited sample.



The macro below assumes the list of fruits/vegetables is in column A, andwill put the results into the adjacent column.

As written, the results are all capitalized.



If this idea works on your data, the capitalization can be changed; the results column can be changed; and, if necessary, the routine can be sped upconsiderably.





To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.

Ensure your project is highlighted in the Project Explorer window.

Then, from the top menu, select Insert/Module and

paste the code below into the window that opens.



To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro by name, and <RUN>.



==================================

Option Explicit

Sub UniqueSimilars()

Dim Rg As Range, c As Range

Dim rRes As Range

Dim col As Collection

Dim i As Long, j As Long, k As Long

Dim v() As Variant



Set Rg = Range("A1", Cells(Rows.Count, "A").End(xlUp))

Set col = New Collection

On Error Resume Next

For Each c In Rg

col.Add Item:=UCase(c.Text), Key:=c.Text

Next c

On Error GoTo 0



ReDim v(1 To col.Count)

For i = 1 To col.Count

For j = LBound(v) To UBound(v)

k = LD(col(i), v(j))

If k <= 2 Then Exit For

Next j

If k > 2 Then v(i) = col(i)

Next i



j = 1

Set rRes = Rg(1, 1).Offset(0, 1)

rRes.EntireColumn.Clear

For i = LBound(v) To UBound(v)

If Len(v(i)) > 0 Then

rRes(j, 1) = v(i)

j = j + 1

End If

Next i



End Sub



'********************************

'*** Compute Levenshtein Distance

'********************************

'http://www.merriampark.com/ld.htm#VB



Private Function LD(ByVal s As String, ByVal t As String) As Long

Dim d() As Long ' matrix

Dim m As Long ' length of t

Dim n As Long ' length of s

Dim i As Long ' iterates through s

Dim j As Long ' iterates through t

Dim s_i As String ' ith character of s

Dim t_j As String ' jth character of t

Dim cost As Long ' cost



' Step 1

n = Len(s)

m = Len(t)

If n = 0 Then

LD = m

Exit Function

End If

If m = 0 Then

LD = n

Exit Function

End If

ReDim d(0 To n, 0 To m) As Long



' Step 2

For i = 0 To n

d(i, 0) = i

Next i



For j = 0 To m

d(0, j) = j

Next j



' Step 3

For i = 1 To n

s_i = Mid$(s, i, 1)



' Step 4

For j = 1 To m

t_j = Mid$(t, j, 1)



' Step 5

If s_i = t_j Then

cost = 0

Else

cost = 1

End If



' Step 6

d(i, j) = Minimum(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost)



Next j

Next i



' Step 7

LD = d(n, m)

Erase d

End Function



'*******************************

'*** Get minimum of three values

'*******************************



Private Function Minimum(ByVal a As Long, _

ByVal b As Long, _

ByVal c As Long) As Long

Dim mi As Long



mi = a

If b < mi Then

mi = b

End If

If c < mi Then

mi = c

End If



Minimum = mi



End Function

=====================================================

Thanks All!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top