M
Microsoft
Hi,
I'm populating a spreadsheet from some databases with quite a bit of
reference data and some formulas and data for analysis. For 80 price lines,
I get about 50 sheets of different reference data calculated from numerous
services and data sources. This results in about 30,000 named cells. The
names are the trouble. They have to be global to the workbook, so they are
all stored in the same "dictionary". The first 5-10,000 names are no
problem, but hereafter performance is O(n^2). Up to 30 pricelines, I can
populate the spreadsheet with one line per second, but for 80 lines it takes
about 5 minutes. I start by putting the cell names in a SafeArray and
populating the spreadsheet one sheet at a time. I can fill 18000 cells in
about 4 seconds. I then call CreateNames for each column. With 1000 lines
and 18 columns, I can create the first 1000 names in 30 ms, the second 1000
names in 60 ms, then 131 ms, 260 ms, 471 ms, 711 ms, .... and the last 1000
names in 4256 ms, or logarithmic longer times. I figured that the
"dictionary" object that holds names in excel is a hash map and that similar
names results in long lists to be traversed for existence, so I've tried to
prefix the names with a unique prefix to aid the hash function to evenly
distribute the map, put this only helped marginally. So, here is the
question :
Is it possible to replace the "dictionary" object that holds the names in
Excel with another COM object, so I could base it on a binary search tree,
or the like ? If not, is it possible to tune the "dictionary" object for
better performance with large volumns of data ?
I can seem to find anything in the COM interfaces or the worksheet
functions, nor the other registered office interfaces. I've also checked all
the DLLs, like mso, for usefull functions, but without luck. I've tried to
turn off all visual, events, history, and calculation effects. The solution
has to work for both Excel 2000 and XP, so I can't disable the error
checking for the 2000 version through the ErrorCheckingOptions interface (
it does give that much anyway in XP ). The Excel.exe process uses more than
99% of the time and my application uses less than 1%. Data is loaded in
SafeArrays and formatting is done on discontinuous ranges without any
problems. Please help as I'm running out of ideas.
Sincerely, Daniel von Tabouillot.
I'm populating a spreadsheet from some databases with quite a bit of
reference data and some formulas and data for analysis. For 80 price lines,
I get about 50 sheets of different reference data calculated from numerous
services and data sources. This results in about 30,000 named cells. The
names are the trouble. They have to be global to the workbook, so they are
all stored in the same "dictionary". The first 5-10,000 names are no
problem, but hereafter performance is O(n^2). Up to 30 pricelines, I can
populate the spreadsheet with one line per second, but for 80 lines it takes
about 5 minutes. I start by putting the cell names in a SafeArray and
populating the spreadsheet one sheet at a time. I can fill 18000 cells in
about 4 seconds. I then call CreateNames for each column. With 1000 lines
and 18 columns, I can create the first 1000 names in 30 ms, the second 1000
names in 60 ms, then 131 ms, 260 ms, 471 ms, 711 ms, .... and the last 1000
names in 4256 ms, or logarithmic longer times. I figured that the
"dictionary" object that holds names in excel is a hash map and that similar
names results in long lists to be traversed for existence, so I've tried to
prefix the names with a unique prefix to aid the hash function to evenly
distribute the map, put this only helped marginally. So, here is the
question :
Is it possible to replace the "dictionary" object that holds the names in
Excel with another COM object, so I could base it on a binary search tree,
or the like ? If not, is it possible to tune the "dictionary" object for
better performance with large volumns of data ?
I can seem to find anything in the COM interfaces or the worksheet
functions, nor the other registered office interfaces. I've also checked all
the DLLs, like mso, for usefull functions, but without luck. I've tried to
turn off all visual, events, history, and calculation effects. The solution
has to work for both Excel 2000 and XP, so I can't disable the error
checking for the 2000 version through the ErrorCheckingOptions interface (
it does give that much anyway in XP ). The Excel.exe process uses more than
99% of the time and my application uses less than 1%. Data is loaded in
SafeArrays and formatting is done on discontinuous ranges without any
problems. Please help as I'm running out of ideas.
Sincerely, Daniel von Tabouillot.