String Arrays

C

chuckles

I need a utility that has to do an enormous number of string comparisons in
arrays based on a text file. I’ve written something in VBA that does what I
need but it is painfully slow. I think the bulk of the time is spent on the
string searching itself.

This is the function I’m using:

pblic Function SequentialSearchStringArray(ByRef sArray() As String, ByVal
sFind As String) As Long
Dim i As Long
Dim iLBound As Long
Dim iUBound As Long
Dim astrFilter() As String

'Quick check - filter the current array to see if the value matches.
astrFilter = Filter(sArray, sFind)

'Unfortuantly this doesn't tell us where the match is so we then have to
do a search to find it
If UBound(astrFilter) < 0 Then
SequentialSearchStringArray = -1
Else
iLBound = LBound(sArray)
iUBound = UBound(sArray)

For i = iLBound To iUBound
If sArray(i) = sFind Then SequentialSearchStringArray = i: Exit
Function
Next i
End If

Can anyone come up with a way of really speeding this up? Is there a way of
treating each string as array of ints or something rather than doing the
endless casting. Ideas, code re-writes etc, greatly appreciated.
 
B

Bernie Deitrick

Chuckles,

The general idea is to:

1: Read your strings from the text file and write them to an array
2: Do a bubble sort on the array to get them into alphabetical order, if the
file was not already sorted
3: Use a binary search algorithm to search your array for the match

I have written code that does the above with the exception of step 2 (since
I have a sorted file). See the code below, which you can modify as needed
(And which you _will_ need to modify - the binary search also returns some
flags (B and N), since it is written to look for 'B'eginning partial matches
and 'N'o matches, but you'll get the idea.) Some of the variables may not be
needed, but I'm too lazy to sort out which ones actually are needed....

HTH,
Bernie
MS Excel MVP

Option Explicit
Option Base 1

Public DictArray() As String
Public myFCount As Integer
Public myFound() As String
Public myTemp As String
Public Counter As Double
Public FoundPath As Boolean
Public tmpStr As String
Public myRet As Long
Public FileLoaded As Boolean

Sub LoadFile()
'This subroutine loads the dictionary file into an array
Dim FileNum As Integer

'Check to see if the file is already in memory
If FileLoaded Then Exit Sub

'First Get Next Available File Handle Number
FileNum = FreeFile()
'Open Dictionary Text File For Input
Open ThisWorkbook.Path & "\" & Range("DictName").Value For Input As #FileNum
'Set The Counter to 1
Counter = 0
'Loop Until the End Of File Is Reached
Do While Seek(FileNum) <= LOF(FileNum)
'Store One Line Of Text From File To array
Counter = Counter + 1
ReDim Preserve DictArray(Counter)
Line Input #FileNum, DictArray(Counter)
'Increment the Counter By 1
'Start Again At Top Of 'Do While' Statement
Loop
'Close The Open Text File
Close

'Set the flag to indicate the file is loaded
FileLoaded = True
End Sub


Function BinaryWordMatch(FindVal As Variant, _
ByVal FirstIndex As Long, _
ByVal LastIndex As Long) As Variant
'Uses binary search routine to compare words to dictionary
'Assumes dictionary file as read into array is sorted in ascending order
Dim TempVal As String
Dim lngIndex As Long
Dim lngIndexPrevious As Long

'Check for obvious cases:
'The string is less than the first entry
'of the dictionary
If (FindVal < DictArray(FirstIndex)) Then
'If the letters begin the first word, then indicate and continue
If Left(DictArray(FirstIndex), Len(FindVal)) = FindVal Then
BinaryWordMatch = "B"
Exit Function
End If
'If the letters don't begin the first word, then
'FindVal is not in the list, so can stop looking
BinaryWordMatch = "N"
Exit Function
End If

'Obvious case #2
'The word is beyond the last word
If (FindVal > DictArray(LastIndex)) Then
'FindVal is not in the list, so stop looking
BinaryWordMatch = "N"
Exit Function
End If

'Obvious case #3 - the word is the last in the dictionary
If FindVal = DictArray(LastIndex) Then
BinaryWordMatch = LastIndex
Exit Function
End If

'Obvious case #4 - the word is the first in the dictionary
If FindVal = DictArray(FirstIndex) Then
'Also check to see if it also starts the next entry
If Left(DictArray(FirstIndex + 1), Len(FindVal)) = FindVal Then
BinaryWordMatch = BinaryWordMatch & " " & "B"
Exit Function
End If
BinaryWordMatch = FirstIndex
Exit Function
End If

'now more dificult cases!

lngIndexPrevious = -1

Do
lngIndex = Int((FirstIndex + LastIndex) / 2)

'If lngIndex is the same as the previous time,
'we have converged without finding value
'First check for word beginnings, if none found then exit loop

If lngIndex = lngIndexPrevious Then

If Left(DictArray(lngIndex - 1), Len(FindVal)) = FindVal Then
'search string begins the previous word
BinaryWordMatch = "B"
Exit Function
End If

If Left(DictArray(lngIndex + 1), Len(FindVal)) = FindVal Then
'search string begins the next word
BinaryWordMatch = "B"
Exit Function
End If

If Left(DictArray(lngIndex), Len(FindVal)) = FindVal Then
'search string begins the current word
BinaryWordMatch = "B"
Exit Function
End If

Exit Do
End If

lngIndexPrevious = lngIndex

TempVal = CStr(DictArray(lngIndex))

'Has the value been found?
If TempVal = FindVal Then
BinaryWordMatch = lngIndex
'Check to see if it also starts the next dictionary entry
If Left(DictArray(lngIndex + 1), Len(FindVal)) = FindVal Then
BinaryWordMatch = BinaryWordMatch & " " & "B"
Exit Function
End If
Exit Function
End If

'Determine which half of list to discard?
If TempVal < FindVal Then
FirstIndex = lngIndex
Else
LastIndex = lngIndex
End If
Loop

'Indicate that no match of any type has been found
BinaryWordMatch = "N"

End Function
 
K

keepITcool

Alternatively to Bernie's solution..

use a dictionary object from the Scripting runtime library.
iso storing your data in arrays you store them in dictionaries.
iso search the array you simply retrieve the data (and position)
by their key.

I've found Dictionary to be faster than vba's Collection.
plus it has advantage of CaseSensitivity, the possibility to read (and
change) the index. Also you can fill an array from a dictionary
by retrieving either the Keys or Index arrays, unlike Collections
where you'd need a loop.





--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


chuckles wrote :
 
C

chuckles

Many thanks both of you. Will try out these techniques and see how I get on.
Chuckles
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top