Sorting a string array

D

Deke

I have been reading thru various post on the forum about sorting large arrays
and most of the post seem to be about integer arrays.

I have found some links and comments about string arrays, but none of them
are exactly what I'm looking for and I'm having some trouble re-writing the
code to get them to work.

Basically, I'm needing a routine that can sort a string array containing
approx 1.5 million elements. As I said, I have had a look around and a quick
sort seems to be the best method to use for this size of array.

I have the code for a quick sort written by Jim Rech in 1998, but this is
for a 2 dimensional array, where mine is just a single dimension.

If I can get the array sorted, then this is going to make processing of the
input file alot quicker.

Hopefully some one can help.

Many thanks in advance...
 
R

RB Smissaert

Function QuickSortString(arrString() As String, _
Optional lLow1 = -1, _
Optional lhigh1 = -1)

'Dimension variables
Dim lLow2 As Long
Dim lhigh2 As Long
Dim strVal1 As String
Dim strVal2 As String

'If first time, get the size of the array to sort
If lLow1 = -1 Then
lLow1 = LBound(arrString, 1)
End If

If lhigh1 = -1 Then
lhigh1 = UBound(arrString, 1)
End If

'Set new extremes to old extremes
lLow2 = lLow1
lhigh2 = lhigh1

'Get value of array item in middle of new extremes
strVal1 = arrString((lLow1 + lhigh1) / 2)

'Loop for all the items in the array between the extremes
While (lLow2 <= lhigh2)

'Find the first item that is greater than the mid-point item
While (arrString(lLow2) < strVal1 And lLow2 < lhigh1)
lLow2 = lLow2 + 1
Wend

'Find the last item that is less than the mid-point item
While (arrString(lhigh2) > strVal1 And lhigh2 > lLow1)
lhigh2 = lhigh2 - 1
Wend

'If the new 'greater' item comes before the new 'less' item, swap them
If (lLow2 <= lhigh2) Then
strVal2 = arrString(lLow2)
arrString(lLow2) = arrString(lhigh2)
arrString(lhigh2) = strVal2

'Advance the pointers to the next item
lLow2 = lLow2 + 1
lhigh2 = lhigh2 - 1
End If
Wend

'Iterate to sort the lower half of the extremes
If (lhigh2 > lLow1) Then Call QuickSortString(arrString, lLow1, lhigh2)

'Iterate to sort the upper half of the extremes
If (lLow2 < lhigh1) Then Call QuickSortString(arrString, lLow2, lhigh1)

QuickSortString = arrString

End Function


There is actually a way to do this much faster, but that needs coding in
VB6. Will see if I can post it later.


RBS
 
R

RB Smissaert

This one is a lot faster, but needs coding in VB6 as it needs all the fast
compiler options.
Don't think it is that much faster if you code in VBA.
Got this code from Olaf Schmidt.

Option Explicit

'(native compiled, all options this Sort needs 0.19 sec for 20000
'Random Strings of 200'er length on a PIII 500).
Private Type SAFEARRAY1D
cDims As Integer
fFeatures As Integer
cbElements As Long
cLocks As Long
pvData As Long
cElements As Long
lLbound As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
(pArr() As Any, PSrc&, Optional ByVal cb& = 4)
Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _
(pArr() As Any, _
Optional PSrc& = 0, _
Optional ByVal cb& = 4)

Public Sub QSort1DP(Arr() As String)

'Dim Compare As String
Dim i As Long
Dim j As Long
Dim Lo As Long
Dim Hi As Long
Dim StPtr As Long
Dim V(0) As String
Dim pV() As Long
Dim sapV As SAFEARRAY1D
Dim pArr() As Long
Dim sapArr As SAFEARRAY1D
Dim p As Long
Dim StSize As Long
Dim StLo() As Long
Dim StHi() As Long

StSize = 255
ReDim StLo(StSize)
ReDim StHi(StSize) 'init the stack

On Error Resume Next
'spans a Long-Array (pArr()) over the StringPointers in Arr()
sapArr.cDims = 1
sapArr.cbElements = 4 'Bytes used by each StrPointer
sapArr.pvData = VarPtr(Arr(0))
sapArr.cElements = UBound(Arr) - LBound(Arr) + 1

If Err Then
Err.Clear
Exit Sub 'Arr was not initialized
End If

On Error GoTo 0 'switch off Err-Handler for speed-reasons
BindArray pArr, VarPtr(sapArr)

'another Array, used to hold only one single String,
'respective its pointer for reasons of comparing inside the algo
sapV.cDims = 1
sapV.cbElements = 4
sapV.pvData = VarPtr(V(0))
sapV.cElements = 1
BindArray pV, VarPtr(sapV)

StPtr = 1 'init the StackPointer
StLo(0) = LBound(Arr)
StHi(0) = UBound(Arr)

Do
StPtr = StPtr - 1
Lo = StLo(StPtr)
Hi = StHi(StPtr)
If Hi - Lo < 12 Then 'MinSort
For Lo = Lo To Hi - 1
j = Lo
For i = Lo + 1 To Hi
If Arr(i) < Arr(j) Then j = i
Next i
If j <> Lo Then
p = pArr(j): pArr(j) = pArr(Lo): pArr(Lo) = p
End If
Next Lo
Else 'QSort
Do
i = Lo: j = Hi
pV(0) = pArr((Lo + Hi) \ 2)
Do
Do While Arr(i) < V(0)
i = i + 1
Loop
Do While Arr(j) > V(0)
j = j - 1
Loop
If i <= j Then
p = pArr(i)
pArr(i) = pArr(j)
pArr(j) = p
i = i + 1
j = j - 1
End If
Loop While i <= j

If j - Lo < Hi - i Then
If i < Hi Then
StLo(StPtr) = i
StHi(StPtr) = Hi
StPtr = StPtr + 1
If StPtr >= StSize Then
StSize = StSize + StSize
ReDim Preserve StLo(StSize)
ReDim Preserve StHi(StSize)
End If
End If
Hi = j
Else
If Lo < j Then
StLo(StPtr) = Lo
StHi(StPtr) = j
StPtr = StPtr + 1
If StPtr >= StSize Then
StSize = StSize + StSize
ReDim Preserve StLo(StSize)
ReDim Preserve StHi(StSize)
End If
End If
Lo = i
End If
Loop While Lo < Hi
End If
Loop While StPtr

pV(0) = 0 'don't dispose the current String-Content of V(0)
ReleaseArray pV 'release the Array-Mapping between V() and pV()
ReleaseArray pArr 'relase the Array-Mapping between Arr() and pArr()

End Sub


RBS
 
B

bart.smissaert

It is in fact a lot faster in VBA as well, have just tested.
It needs some adjustments though to work with a 1-based array, which
I haven't worked out yet. Fine though with a 0-based array.

RBS
 
D

Deke

Hi,

Thank for the reply. Got both the bits of code to work perfectly first
time. The VB script is a lot faster and is exactly what I'm needing, I've
got the processing time down from 20-30 min's to seconds.

Again, thank's for all your help, it was exactly what I was needing...
 
R

RB Smissaert

This will work with both an 0-based array and a 1-based array:

Public Sub QSort1DStringArrayP(arrString() As String)

'Dim Compare As String
Dim i As Long
Dim j As Long
Dim Lo As Long
Dim Hi As Long
Dim StPtr As Long
Dim V(0) As String
Dim pV() As Long
Dim sapV As SAFEARRAY1D
Dim pArr() As Long
Dim sapArr As SAFEARRAY1D
Dim p As Long
Dim StSize As Long
Dim StLo() As Long
Dim StHi() As Long

StSize = 255
ReDim StLo(StSize)
ReDim StHi(StSize) 'init the stack

On Error Resume Next
'spans a Long-Array (pArr()) over the StringPointers in Arr()
sapArr.cDims = 1
sapArr.cbElements = 4 'Bytes used by each StrPointer
sapArr.lLbound = LBound(arrString)
sapArr.pvData = VarPtr(arrString(sapArr.lLbound))
sapArr.cElements = UBound(arrString) - LBound(arrString) + 1

If Err Then
Err.Clear
Exit Sub 'Arr was not initialized
End If

On Error GoTo 0 'switch off Err-Handler for speed-reasons
BindArray pArr, VarPtr(sapArr)

'another Array, used to hold only one single String,
'respective its pointer for reasons of comparing inside the algo
sapV.cDims = 1
sapV.cbElements = 4
sapV.pvData = VarPtr(V(0))
sapV.cElements = 1
BindArray pV, VarPtr(sapV)

StPtr = 1 'init the StackPointer
StLo(0) = LBound(arrString)
StHi(0) = UBound(arrString)

Do
StPtr = StPtr - 1
Lo = StLo(StPtr)
Hi = StHi(StPtr)
If Hi - Lo < 12 Then 'MinSort
For Lo = Lo To Hi - 1
j = Lo
For i = Lo + 1 To Hi
If arrString(i) < arrString(j) Then j = i
Next i
If j <> Lo Then
p = pArr(j): pArr(j) = pArr(Lo): pArr(Lo) = p
End If
Next Lo
Else 'QSort
Do
i = Lo: j = Hi
pV(0) = pArr((Lo + Hi) \ 2)
Do
Do While arrString(i) < V(0)
i = i + 1
Loop
Do While arrString(j) > V(0)
j = j - 1
Loop
If i <= j Then
p = pArr(i)
pArr(i) = pArr(j)
pArr(j) = p
i = i + 1
j = j - 1
End If
Loop While i <= j

If j - Lo < Hi - i Then
If i < Hi Then
StLo(StPtr) = i
StHi(StPtr) = Hi
StPtr = StPtr + 1
If StPtr >= StSize Then
StSize = StSize + StSize
ReDim Preserve StLo(StSize)
ReDim Preserve StHi(StSize)
End If
End If
Hi = j
Else
If Lo < j Then
StLo(StPtr) = Lo
StHi(StPtr) = j
StPtr = StPtr + 1
If StPtr >= StSize Then
StSize = StSize + StSize
ReDim Preserve StLo(StSize)
ReDim Preserve StHi(StSize)
End If
End If
Lo = i
End If
Loop While Lo < Hi
End If
Loop While StPtr

pV(0) = 0 'don't dispose the current String-Content of V(0)
ReleaseArray pV 'release the Array-Mapping between V() and pV()
ReleaseArray pArr 'relase the Array-Mapping between Arr() and pArr()

End Sub

The declarations etc. are just the same as before.
It will be faster if you could code in VB6 (worth buying) as you have all
the fast compiler options
that are not available in VBA, but even in VBA this is very fast.


RBS
 
D

Deke

I was able to re-write my code to use a 0-based array, so the original code
works perfectly.

Thanks again for youe help, got the processing time for the file down from
30-ish minutes to 3 minutes, which is alot more reasonable.
 
D

Deke

I was able to re-write my code to use a 0-based array, so the original code
works perfectly.

Thanks again for youe help, got the processing time for the file down from
30-ish minutes to 3 minutes, which is alot more reasonable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top