Removing duplicates from multiple arrays

T

Tushar Mehta

I wrote this as an intellectual exercise after reading Dick Kusleika's blog
post on the same subject at
http://www.dicks-blog.com/archives/2006/01/16/remove-duplicates-from-arrays/
and thought I'd share it here.

Dick's code is very straightforward and with his to-the-point comments easy
to understand. The code below provides a different take on the subject
addressing issues like:

The requirement of balancing the returned arrays is improved through the use
of the 'Rebalance' argument. If true, the code runs a little slower but
returns better balanced arrays.

The new subroutine is easier to use, IMO, because it uses ParamArrays. It
*replaces* the contents of the original arrays. That may be good or bad
depending on one's intent.

The new code lifts the requirement that every element of every array be a
variant (because of the "blanking the duplicate element" bit in the original
code).

The new code doesn't rely on a predetermined lower bound. In the
VB6/VB.Net/legacy XL/VBA environment there is no way to force every array to
have a particular lower bound.

Watch out for line wraps. Sorry about that but trying to fix them would
make reading the code even more of a chore.

Option Explicit
Public Type LenData 'Contains paramarray index and len of array in that
position
Idx As Long
ArrLen As Long
End Type

Sub StartArrays()

Dim vaOne As Variant
Dim vaTwo As Variant
Dim vaThree As Variant
Dim vaMain As Variant
'Set up some secondary arrays
vaOne = Array(1, 2, 3, 4, 5)
vaTwo = Array(5, 6, 7)
vaThree = Array(1, 5, 8, 9)

vaOne = Array(5, 4, 3, 7, 1, 1)
vaTwo = Array(5, 5, 6, 7, 5)
vaThree = Array(8, 9, 2, 1, 2)

Dim vaFour As Variant, I As Integer
ReDim vaFour(1 To 10) As Integer
For I = LBound(vaFour) To UBound(vaFour): vaFour(I) = I: Next I

RemoveDuplicates True, vaOne, vaTwo, vaThree, vaFour
ShowResults vaOne, vaTwo, vaThree, vaFour

End Sub

Function ArrLen(anArr, Optional aDim As Integer = 1)
On Error Resume Next
ArrLen = UBound(anArr, aDim) - LBound(anArr, aDim) + 1
End Function
Function DynArrLen(anArr, LastElement As Long, Optional aDim As Integer
= 1)
On Error Resume Next
DynArrLen = LastElement - LBound(anArr, aDim) + 1
End Function
Function sortIndices(vaMain As Variant, _
ValidLastElement() As Long) As LenData()
Dim Rslt() As LenData, I As Long, J As Long, _
Temp As LenData
ReDim Rslt(LBound(vaMain) To UBound(vaMain))
For I = LBound(vaMain) To UBound(vaMain)
Rslt(I).Idx = I
Rslt(I).ArrLen = DynArrLen(vaMain(I), ValidLastElement(I))
Next I
'Bubble sort the array containing the lengths of the _
original arrays
For I = LBound(Rslt) To UBound(Rslt) - 1
For J = I + 1 To UBound(Rslt)
If Rslt(I).ArrLen > Rslt(J).ArrLen Then
Temp = Rslt(I)
Rslt(I) = Rslt(J)
Rslt(J) = Temp
End If
Next J
Next I
sortIndices = Rslt
End Function
Sub removeElement(ByRef origArr, ByRef ValidLastElement As Long, _
Idx As Long)
origArr(Idx) = origArr(ValidLastElement)
ValidLastElement = ValidLastElement - 1
End Sub
Sub shrinkArray(ByRef anArr, ByVal NewLastElementIdx)
If NewLastElementIdx < LBound(anArr) Then
Erase anArr
Else
ReDim Preserve anArr(LBound(anArr) To NewLastElementIdx)
End If
Exit Sub
End Sub
Sub RemoveDuplicates(attemptRebalance As Boolean, ParamArray InArr())
'Returns the arrays with duplicates removed. One instance of _
every duplicate will be returned, though which array it will be _
in is not predetermined.
'WARNING: Every argument to this procedure must be a variant _
======= which contains an array, i.e., of the nature _
Dim x:redim x(..) w/ or w/o an explicit type or _
Dim x:x=array(...) It cannot be something declared as an array, _
i.e., it cannot be Dim({bounds}) or Dim x() w/ or w/o an _
explicit type. If one of the arguments is an array (as opposed _
to a variant that contains an array) the results will be _
catastrophic. The returned data will be corrupt and a 2nd call _
on this routine will corrupt various memory structures. Excel _
will crash and ask to talk to mama.
Dim vaMain()
vaMain = InArr
'vaMain is used solely because I couldn't figure out how to pass _
Inarr to a procedure such as sortIndices. It appears that there _
are several undocumented but compiler-enforced restrictions on a _
paramarray's use.
If Not IsArray(vaMain) Then Exit Sub

Dim I As Long
For I = LBound(vaMain) To UBound(vaMain)
If Not IsArray(vaMain(I)) Then Exit Sub
Next I

'Here's how we 'remove' duplicates. Use 2 helper arrays. The _
first, SortedIndices maintains a list of the indices of the _
actual arrays based on their respective array sizes. Hence _
the array pointed to by SortedIndices(0) is the smallest array. _
The 2nd helper array is the ValidLastElement. It has one entry _
for each actual array. This entry is the index of the last _
valid element in that array. It's how we 'remove' duplicates _
(see below).
'The overall algorithm is as follows: _
Go through each array starting with the smallest array - _
indicated by the first element in sortedIndices doing the _
following: _
For every element in this 'master' array -- upto the _
last-valid-element pointer: _
Search every array including the master array and _
'remove' the element, if found, from the array being searched. _
An element is 'removed' using the foll. technique: _
Replace that element with the current _
last-valid-element and decrement the pointer by 1. _
When, the master array is completely analyzed, just do a redim _
preserve and the dups will be gone from that array. _
This technique should dramatically speed up the procedure. A _
secondary speed benefit will be that when we are going through an _
array we only have to go up to the current ValidLastElement.
Dim ValidLastElement() As Long
ReDim ValidLastElement(LBound(vaMain) To UBound(vaMain))
Dim J As Long, K As Long, KK As Long
For I = LBound(ValidLastElement) To UBound(ValidLastElement)
ValidLastElement(I) = UBound(vaMain(I))
Next I
DoRebalance:
Dim sortedIndices() As LenData
sortedIndices = sortIndices(vaMain, ValidLastElement)
For I = LBound(sortedIndices) To UBound(sortedIndices) - 1
'the last array does not have to be checked; just _
redimmed correctly.
For K = LBound(vaMain(sortedIndices(I).Idx)) _
To ValidLastElement(sortedIndices(I).Idx)
'optimized down from _
UBound(vaMain(sortedIndices(I).Idx))
For J = UBound(sortedIndices) To I Step -1
If J = I Then
KK = K + 1
Else
KK = LBound(vaMain(sortedIndices(J).Idx))
End If
Do While KK <= ValidLastElement(sortedIndices(J).Idx)
'optimized down from UBound(vaMain(sortedIndices
(J).Idx))
'Cannot use a For loop since ValidLastElement(...) _
may/will be modified within the loop
If vaMain(sortedIndices(I).Idx)(K) = _
vaMain(sortedIndices(J).Idx)(KK) Then
removeElement vaMain(sortedIndices(J).Idx), _
ValidLastElement(sortedIndices(J).Idx), KK
If J = LBound(sortedIndices) Then
ElseIf attemptRebalance _
And DynArrLen(vaMain(sortedIndices(J).Idx), _
ValidLastElement(sortedIndices(J).Idx))
_
< DynArrLen(vaMain(sortedIndices(J -
1).Idx), _
ValidLastElement(sortedIndices(J -
1).Idx)) Then
Debug.Print "Rebalance: I=" & I _
& ", sortedindices(i).idx=" & sortedIndices
(I).Idx _
& ", J=" & J _
& ", sortedindices(j).idx=" & sortedIndices
(J).Idx _
& ", sortedindices(j-1).idx=" &
sortedIndices(J - 1).Idx _
& ",ValidLastElement(sortedIndices(J).idx)
=" _
& ValidLastElement(sortedIndices(J).Idx)
_
& ",ValidLastElement(sortedIndices(J-
1).idx=" _
& ValidLastElement(sortedIndices(J - 1).Idx)
GoTo DoRebalance '<<<<<
End If
End If
KK = KK + 1
Loop
Next J

Next K
shrinkArray vaMain(sortedIndices(I).Idx), _
ValidLastElement(sortedIndices(I).Idx)
'OK to shrink (optimize) because this array will never be
changed
Next I
shrinkArray vaMain(sortedIndices(UBound(sortedIndices)).Idx), _
ValidLastElement(sortedIndices(UBound(sortedIndices)).Idx)
For I = LBound(vaMain) To UBound(vaMain)
Erase InArr(I): InArr(I) = vaMain(I)
Next I
End Sub

Sub ShowResults(ParamArray vaMain())

Dim I As Long, J As Long

For I = LBound(vaMain) To UBound(vaMain)
If ArrLen(vaMain(I)) > 0 Then
For J = LBound(vaMain(I)) To UBound(vaMain(I))
Debug.Print I, J, vaMain(I)(J)
Next J
Else
Debug.Print I, "No elements"
End If
Debug.Print "---------------"
Next I
End Sub


--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top