Sorting in VBA (andVB)

H

Howard Kaikow

I guess that 29 Feb is an odd day on which to have a birth announcement.

Up until a few years ago, I used to commit the sin of recommending that
folkes use WordBasic.SortArray for sorting, well a number of years ago, I
started some investigations and had an epiphany. So here's the birth
announcement of a program that I finally completed. It is useful in
determining a better way to sort in VB/VBA.

I posted the program and documentation at
http://www.standards.com/Sorting/SortPerformanceComparison-Description.html

Code is also posted for most of the sorting routines.
 
J

Jay Freedman

Thank you, Howard. That's very impressive.

I'll be looking at the code as I get some spare time (though you know
how often that happens<g>).
 
J

Jonathan West

Hi Howard,

I've taken a very quick look at a few of the routines, and it has prompted
me to ask a question. When sorting an array of strings, what mechanism are
you using to compare the strings?

In the routines I looked at, from the modSortHK.bas module, it appears that
you are doing a straightforward "if string1 > string2 Then" comparison.

The problem with this is that while it might be very fast, this method
doesn't necessarily give a sort order that is very useful. For instance, I
doubt that most people would expect the four items "bath", "Bath", "cattle",
"Cattle" to be sorted "Bath", "Cattle", "bath", "cattle". That is the sort
order which results from a straightforward string comparison. The
InsertionString routine in your modSortHK.bas module gives precisely this
sort order.

WordBasic.SortArray has a more sophisticated comparison mechanism that does
an alphabetic comparison, so that sort order for those four items is "bath",
"Bath", "cattle", "Cattle", which is much more what users might expect.

I'm therefore concerned that your performance comparisons might (quite
unintentionally) be misleading because they are not comparing algorithms
which have the same output.

If you want to produce a performance comparison of WordBasic.SortArray and
other algorithms, then it really is necessary to ensure that the other
algorithms produce the same output. In principle, this should not be too
hard to achieve. All that is required is that you write a suitable
CompareStrings routine, which returns a boolean value depending on the
relative position of two strings in a SortArray-style algorithm. Such a
routine could then by dropped into the various string sorting algorithms as
a direct replacement of the "string1 > string2" comparison you have now.

Your choice of string data for your comparison (i.e. strings containing only
numeric characters) masks the fact that different comparison algorithms are
being used.

It might be that even with a more sophisticated string comparison that these
other algorithms (Quicksort especially) might still be faster than
WordBasic.SortArray, but until you do a like-for-like comparison, there is
no way of knowing whether the apparent poor performance of
WordBasic.SortArray is due to an inefficient sorting algorithm or more
computer-intensive string comparison.

A further check might need to be made of the sorting algorithms in Excel
that you are including in the comparison, to see what sort order they
produce with a wider range of string input data.

--
Regards
Jonathan West - Word MVP
http://www.multilinker.com
Please reply to the newsgroup
 
H

Howard Kaikow

One of the difficulties in doing such a project is translating the C to VB.
All the algorithms seem to work, but I have no way of knowing how faithful
were my translations from C to VB.

I guess if there are any bugs, you folkes will let me know.
 
H

Howard Kaikow

Yes, but I specifically chose to use numeric data as that permits comparison
among sorting algorithms for a given data type.

Modifying the algorithms to handle strings as you indicate would have the
same effect in ALL the string algorithms, so it should not affect the
performance comparisons. Indeed, I'd guess that if I optimized the code I
used to handle such comparisions, WordBasic.SortArray would come out even
worse.

And, remember, I am coding without using memory copy API, etc., which I
expect are being used by Word/Excel internally.

Even if this biases the test for Strings against WordBasic.SortArray, it has
no impact on the type Double and type Long.
The following is a sample result for type Double and Long.
WordBasic.SortArray is using an ineffective algorithm, e.g., compare with
the results using either VB 6 ListBox sort or Excel Sort.

Sort Performance Comparison(Start: 0:37:54 on 1 Mar 2004)
Set randomize seed: Yes
Algorithm Comparison(Start: 0:38:14 on 1 Mar 2004)
Data type: Long
Display data: No
Use integer data: Yes
Maximum integer value: 999
Number of data items: 3000
Number of samples: 10
Order of source data: Random
Reuse data: No
Verify sort: No
Source: Array
Target: Array
Counting Sort(S6.17): Average 1 milliseconds(13/10)
Heap Sort(S9.7): Average 8 milliseconds(80/10)
Quick Sort(S7.4): Average 3 milliseconds(31/10)
Shell Sort(S6.5): Average 6 milliseconds(61/10)
WordBasic.SortArray: Average 2100 milliseconds(21009/10)
Algorithm Comparison(End: 0:38:39 on 1 Mar 2004, timed/elapsed: 21196/24819
milliseconds, 85.40%)
Algorithm Comparison(Start: 0:38:46 on 1 Mar 2004)
Data type: Double
Display data: No
Use integer data: Yes
Maximum integer value: 999
Number of data items: 3000
Number of samples: 10
Order of source data: Random
Reuse data: No
Verify sort: No
Source: Array
Target: Array
Counting Sort(S6.17): Average 2 milliseconds(27/10)
Heap Sort(S9.7): Average 8 milliseconds(86/10)
Quick Sort(S7.4): Average 3 milliseconds(38/10)
Shell Sort(S6.5): Average 6 milliseconds(69/10)
WordBasic.SortArray: Average 2122 milliseconds(21221/10)
Algorithm Comparison(End: 0:39:08 on 1 Mar 2004, timed/elapsed: 21443/21735
milliseconds, 98.66%)
Sort Performance Comparison(End: 0:39:13 on 1 Mar 2004)

--
http://www.standards.com/; See Howard Kaikow's web site.
Jonathan West said:
Hi Howard,

I've taken a very quick look at a few of the routines, and it has prompted
me to ask a question. When sorting an array of strings, what mechanism are
you using to compare the strings?

In the routines I looked at, from the modSortHK.bas module, it appears that
you are doing a straightforward "if string1 > string2 Then" comparison.

The problem with this is that while it might be very fast, this method
doesn't necessarily give a sort order that is very useful. For instance, I
doubt that most people would expect the four items "bath", "Bath", "cattle",
"Cattle" to be sorted "Bath", "Cattle", "bath", "cattle". That is the sort
order which results from a straightforward string comparison. The
InsertionString routine in your modSortHK.bas module gives precisely this
sort order.

WordBasic.SortArray has a more sophisticated comparison mechanism that does
an alphabetic comparison, so that sort order for those four items is "bath",
"Bath", "cattle", "Cattle", which is much more what users might expect.

I'm therefore concerned that your performance comparisons might (quite
unintentionally) be misleading because they are not comparing algorithms
which have the same output.

If you want to produce a performance comparison of WordBasic.SortArray and
other algorithms, then it really is necessary to ensure that the other
algorithms produce the same output. In principle, this should not be too
hard to achieve. All that is required is that you write a suitable
CompareStrings routine, which returns a boolean value depending on the
relative position of two strings in a SortArray-style algorithm. Such a
routine could then by dropped into the various string sorting algorithms as
a direct replacement of the "string1 > string2" comparison you have now.

Your choice of string data for your comparison (i.e. strings containing only
numeric characters) masks the fact that different comparison algorithms are
being used.

It might be that even with a more sophisticated string comparison that these
other algorithms (Quicksort especially) might still be faster than
WordBasic.SortArray, but until you do a like-for-like comparison, there is
no way of knowing whether the apparent poor performance of
WordBasic.SortArray is due to an inefficient sorting algorithm or more
computer-intensive string comparison.

A further check might need to be made of the sorting algorithms in Excel
that you are including in the comparison, to see what sort order they
produce with a wider range of string input data.

--
Regards
Jonathan West - Word MVP
http://www.multilinker.com
Please reply to the newsgroup



Howard Kaikow said:
I guess that 29 Feb is an odd day on which to have a birth announcement.

Up until a few years ago, I used to commit the sin of recommending that
folkes use WordBasic.SortArray for sorting, well a number of years ago, I
started some investigations and had an epiphany. So here's the birth
announcement of a program that I finally completed. It is useful in
determining a better way to sort in VB/VBA.

I posted the program and documentation at
http://www.standards.com/Sorting/SortPerformanceComparison-Description.html
 
J

Jonathan West

Howard Kaikow said:
Yes, but I specifically chose to use numeric data as that permits comparison
among sorting algorithms for a given data type.

Modifying the algorithms to handle strings as you indicate would have the
same effect in ALL the string algorithms, so it should not affect the
performance comparisons. Indeed, I'd guess that if I optimized the code I
used to handle such comparisions, WordBasic.SortArray would come out even
worse.

Since an alphabetic comparison is more CPU-intensive than an ASCII
comparison, WordBasic.SortArray would come out better relative to all of the
coded solutions, but the coded solutions should remain in the same order
relative to each other. WordBasic.SortArray might still come out poorly
overall, but that could only be deteremined by a realistic test.
And, remember, I am coding without using memory copy API, etc., which I
expect are being used by Word/Excel internally.

Even if this biases the test for Strings against WordBasic.SortArray, it has
no impact on the type Double and type Long.

That is true. But my experience is that the vast majority of sorts in
general-purpose business computing are alphabetic sorts of strings, and
eliminating this kind of sort from your figures significantly reduces their
practical value.
 
H

Howard Kaikow

I made another version of the code that uses Option Compare Text in all
modules.
Although that did slow down the sorts, WordBasic.SortArray still needs to
hang its head in shame.






Uses no Option Compare
Option Compare Text

Bubble Sort(S6.4)
Average 5433 milliseconds(54339/10)
Average 8600 milliseconds(86006/10)

Counting Sort(S6.17)
Average 26 milliseconds(261/10)
Average 25 milliseconds(254/10)

Heap Sort(S9.7)
Average 76 milliseconds(768/10)
Average 126 milliseconds(1260/10)

Insertion Sort(S6.3)
Average 2003 milliseconds(20038/10)
Average 3768 milliseconds(37686/10)

Merge Sort(S8.3)
Average 58 milliseconds(582/10)
Average 84 milliseconds(843/10)

Quick Sort(S7.4)
Average 35 milliseconds(355/10)
Average 66 milliseconds(666/10)

Selection Sort(S6.2)
Average 1676 milliseconds(16763/10)
Average 5245 milliseconds(52452/10)

Shell Sort(S6.5)
Average 60 milliseconds(603/10)
Average 100 milliseconds(1002/10)

WordBasic.SortArray
Average 3971 milliseconds(39712/10)
Average 3924 milliseconds(39240/10)

Word Document Sort
Average 631 milliseconds(6314/10)
Average 620 milliseconds(6204/10)

VB 6 ListBox Sort
Average 151 milliseconds(1514/10)
Average 152 milliseconds(1524/10)

Excel Worksheet Sort
Average 1312 milliseconds(13124/10)
Average 1308 milliseconds(13085/10)





--
http://www.standards.com/; See Howard Kaikow's web site.
Howard Kaikow said:
Yes, but I specifically chose to use numeric data as that permits comparison
among sorting algorithms for a given data type.

Modifying the algorithms to handle strings as you indicate would have the
same effect in ALL the string algorithms, so it should not affect the
performance comparisons. Indeed, I'd guess that if I optimized the code I
used to handle such comparisions, WordBasic.SortArray would come out even
worse.

And, remember, I am coding without using memory copy API, etc., which I
expect are being used by Word/Excel internally.

Even if this biases the test for Strings against WordBasic.SortArray, it has
no impact on the type Double and type Long.
The following is a sample result for type Double and Long.
WordBasic.SortArray is using an ineffective algorithm, e.g., compare with
the results using either VB 6 ListBox sort or Excel Sort.

Sort Performance Comparison(Start: 0:37:54 on 1 Mar 2004)
Set randomize seed: Yes
Algorithm Comparison(Start: 0:38:14 on 1 Mar 2004)
Data type: Long
Display data: No
Use integer data: Yes
Maximum integer value: 999
Number of data items: 3000
Number of samples: 10
Order of source data: Random
Reuse data: No
Verify sort: No
Source: Array
Target: Array
Counting Sort(S6.17): Average 1 milliseconds(13/10)
Heap Sort(S9.7): Average 8 milliseconds(80/10)
Quick Sort(S7.4): Average 3 milliseconds(31/10)
Shell Sort(S6.5): Average 6 milliseconds(61/10)
WordBasic.SortArray: Average 2100 milliseconds(21009/10)
Algorithm Comparison(End: 0:38:39 on 1 Mar 2004, timed/elapsed: 21196/24819
milliseconds, 85.40%)
Algorithm Comparison(Start: 0:38:46 on 1 Mar 2004)
Data type: Double
Display data: No
Use integer data: Yes
Maximum integer value: 999
Number of data items: 3000
Number of samples: 10
Order of source data: Random
Reuse data: No
Verify sort: No
Source: Array
Target: Array
Counting Sort(S6.17): Average 2 milliseconds(27/10)
Heap Sort(S9.7): Average 8 milliseconds(86/10)
Quick Sort(S7.4): Average 3 milliseconds(38/10)
Shell Sort(S6.5): Average 6 milliseconds(69/10)
WordBasic.SortArray: Average 2122 milliseconds(21221/10)
Algorithm Comparison(End: 0:39:08 on 1 Mar 2004, timed/elapsed: 21443/21735
milliseconds, 98.66%)
Sort Performance Comparison(End: 0:39:13 on 1 Mar 2004)

--
http://www.standards.com/; See Howard Kaikow's web site.
Jonathan West said:
Hi Howard,

I've taken a very quick look at a few of the routines, and it has prompted
me to ask a question. When sorting an array of strings, what mechanism are
you using to compare the strings?

In the routines I looked at, from the modSortHK.bas module, it appears that
you are doing a straightforward "if string1 > string2 Then" comparison.

The problem with this is that while it might be very fast, this method
doesn't necessarily give a sort order that is very useful. For instance, I
doubt that most people would expect the four items "bath", "Bath", "cattle",
"Cattle" to be sorted "Bath", "Cattle", "bath", "cattle". That is the sort
order which results from a straightforward string comparison. The
InsertionString routine in your modSortHK.bas module gives precisely this
sort order.

WordBasic.SortArray has a more sophisticated comparison mechanism that does
an alphabetic comparison, so that sort order for those four items is "bath",
"Bath", "cattle", "Cattle", which is much more what users might expect.

I'm therefore concerned that your performance comparisons might (quite
unintentionally) be misleading because they are not comparing algorithms
which have the same output.

If you want to produce a performance comparison of WordBasic.SortArray and
other algorithms, then it really is necessary to ensure that the other
algorithms produce the same output. In principle, this should not be too
hard to achieve. All that is required is that you write a suitable
CompareStrings routine, which returns a boolean value depending on the
relative position of two strings in a SortArray-style algorithm. Such a
routine could then by dropped into the various string sorting algorithms as
a direct replacement of the "string1 > string2" comparison you have now.

Your choice of string data for your comparison (i.e. strings containing only
numeric characters) masks the fact that different comparison algorithms are
being used.

It might be that even with a more sophisticated string comparison that these
other algorithms (Quicksort especially) might still be faster than
WordBasic.SortArray, but until you do a like-for-like comparison, there is
no way of knowing whether the apparent poor performance of
WordBasic.SortArray is due to an inefficient sorting algorithm or more
computer-intensive string comparison.

A further check might need to be made of the sorting algorithms in Excel
that you are including in the comparison, to see what sort order they
produce with a wider range of string input data.

--
Regards
Jonathan West - Word MVP
http://www.multilinker.com
Please reply to the newsgroup
ago,
http://www.standards.com/Sorting/SortPerformanceComparison-Description.html
 
H

Howard Kaikow

Later today, or tomorrow, I expect to post version 1.1, which allows for the
option of a case sensitive sort.
 
H

Howard Kaikow

I posted a version today with a better set of messages to clarify things.

For example, try selecting WordBasic Sort when the case insensitive option
is unchecked.
 
H

Howard Kaikow

I've created a version of the program hat allows use of non-numeric string
data.
After I update the accompanying HTML docs, I'll post the critter, it is
version 1.2.0.

Encountered a significant VB 6 compiler issue in doing this, which I expect
to raise in the VB 6 group.

--
http://www.standards.com/; See Howard Kaikow's web site.
Jonathan West said:
Hi Howard,

I've taken a very quick look at a few of the routines, and it has prompted
me to ask a question. When sorting an array of strings, what mechanism are
you using to compare the strings?

In the routines I looked at, from the modSortHK.bas module, it appears that
you are doing a straightforward "if string1 > string2 Then" comparison.

The problem with this is that while it might be very fast, this method
doesn't necessarily give a sort order that is very useful. For instance, I
doubt that most people would expect the four items "bath", "Bath", "cattle",
"Cattle" to be sorted "Bath", "Cattle", "bath", "cattle". That is the sort
order which results from a straightforward string comparison. The
InsertionString routine in your modSortHK.bas module gives precisely this
sort order.

WordBasic.SortArray has a more sophisticated comparison mechanism that does
an alphabetic comparison, so that sort order for those four items is "bath",
"Bath", "cattle", "Cattle", which is much more what users might expect.

I'm therefore concerned that your performance comparisons might (quite
unintentionally) be misleading because they are not comparing algorithms
which have the same output.

If you want to produce a performance comparison of WordBasic.SortArray and
other algorithms, then it really is necessary to ensure that the other
algorithms produce the same output. In principle, this should not be too
hard to achieve. All that is required is that you write a suitable
CompareStrings routine, which returns a boolean value depending on the
relative position of two strings in a SortArray-style algorithm. Such a
routine could then by dropped into the various string sorting algorithms as
a direct replacement of the "string1 > string2" comparison you have now.

Your choice of string data for your comparison (i.e. strings containing only
numeric characters) masks the fact that different comparison algorithms are
being used.

It might be that even with a more sophisticated string comparison that these
other algorithms (Quicksort especially) might still be faster than
WordBasic.SortArray, but until you do a like-for-like comparison, there is
no way of knowing whether the apparent poor performance of
WordBasic.SortArray is due to an inefficient sorting algorithm or more
computer-intensive string comparison.

A further check might need to be made of the sorting algorithms in Excel
that you are including in the comparison, to see what sort order they
produce with a wider range of string input data.

--
Regards
Jonathan West - Word MVP
http://www.multilinker.com
Please reply to the newsgroup



Howard Kaikow said:
I guess that 29 Feb is an odd day on which to have a birth announcement.

Up until a few years ago, I used to commit the sin of recommending that
folkes use WordBasic.SortArray for sorting, well a number of years ago, I
started some investigations and had an epiphany. So here's the birth
announcement of a program that I finally completed. It is useful in
determining a better way to sort in VB/VBA.

I posted the program and documentation at
http://www.standards.com/Sorting/SortPerformanceComparison-Description.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top