B
Bas V
I have attempted again to get the text import feature of MS Excel 2007 to
work using a really long (10,402,255 lines) ascii format text data files with
two columns of fixed width fields. Despite the updates to the Excel 2007
program since I first tried this, the implementation of this feature remains
pitiful.
Here are my findings of how this feature of Excel behaves:
1. If you set up the text import to start on the first line of the
text source file and put them into your worksheet starting on cell A1, then
the importer will capture the first 1048576 lines and put them all into your
worksheet - so it works OK there and it is fast - about 14 seconds on my
machine. 1048576 is the maximum number of rows that you can accommodate on
a single worksheet in Excel 2007.
2. If you set up the importer to start on line 30,001 (which skips the
first 30000 lines), then the importer captures all the lines up to line
1048577 in the source text file. This means you will have two columns of
data up to row 1018577 in your worksheet. This is the first indication that
the text file importer cannot properly access lines in a source text file
beyond a maximum of the first 1048576 (2^20) lines. Attempts to start on
another line (e.g. 65001 will fill down to row 983577) shows behaviour
consistent with this observation. If you attempt an import starting on line
1048576 of a source text file, you will get exactly two rows of data - this
is again consistent (and bad).
3. If you start on line 1048577, then you get only one row and it only
fills in only the first cell of that row with the time data from line 1048577
- so you get the first column but not the second. The rest of the rows in
the 1048576 row text import range are then blank (as per import setting to
replace contents). Likewise, if you attempt to start on line 1048578, then
you still get only one row and it only fills in the first cell of that row
with the time data from line 1048578. It does this even for line 10402254 -
and then you still only got one cell of data from line 10402254 - plus it
takes a couple of minutes to do it.
4. The text import will work at all only as long as you have the
import range set up on a separate sheet. If you attempt to set up two import
data ranges next to one another on the same sheet, then Excel will quit
"unexpectedly" when you try to update the data connection - i.e. it crashes.
I found similar stupid behaviour when you have formulae to the right of the
import range - it either crashes or takes hours (instead of a few seconds) to
complete the operation.
The bad behaviour beyond text file line 1048576 is obviously not what the
Excel programmers intended. The program displays the following friendly
advice when you start your test import on line 1 of a really large text file:
"The text file contains more data then will fit on a single worksheet. To
continue and import as much of the data as will fit, click OK. You can then
import the rest of the data by repeating the import operation on another
worksheet and using the Text Import Wizard to exclude data already imported"
The fact is that you can't.
The indications are that the "Text Import Wizard" is built with text file
line addressing that only uses 20 bits - more like a "Text Import Doofus."
Too bad too, since my calculations indicate that if it worked properly, I
would be able to parse a 10,000,000 line text file with two fixed width
columns into Excel worksheet cells in about 2.5 minutes rather than the ~75
minutes the stripped down vba script I had to write takes to do it line by
line; this was the "solution" offered by a guy in Microsoft Premiere tech
support. I also noted that both vba and the "wizard" use only one processor.
Is there prospect that Microsoft will wise up this Wizard and make it work
correctly??
work using a really long (10,402,255 lines) ascii format text data files with
two columns of fixed width fields. Despite the updates to the Excel 2007
program since I first tried this, the implementation of this feature remains
pitiful.
Here are my findings of how this feature of Excel behaves:
1. If you set up the text import to start on the first line of the
text source file and put them into your worksheet starting on cell A1, then
the importer will capture the first 1048576 lines and put them all into your
worksheet - so it works OK there and it is fast - about 14 seconds on my
machine. 1048576 is the maximum number of rows that you can accommodate on
a single worksheet in Excel 2007.
2. If you set up the importer to start on line 30,001 (which skips the
first 30000 lines), then the importer captures all the lines up to line
1048577 in the source text file. This means you will have two columns of
data up to row 1018577 in your worksheet. This is the first indication that
the text file importer cannot properly access lines in a source text file
beyond a maximum of the first 1048576 (2^20) lines. Attempts to start on
another line (e.g. 65001 will fill down to row 983577) shows behaviour
consistent with this observation. If you attempt an import starting on line
1048576 of a source text file, you will get exactly two rows of data - this
is again consistent (and bad).
3. If you start on line 1048577, then you get only one row and it only
fills in only the first cell of that row with the time data from line 1048577
- so you get the first column but not the second. The rest of the rows in
the 1048576 row text import range are then blank (as per import setting to
replace contents). Likewise, if you attempt to start on line 1048578, then
you still get only one row and it only fills in the first cell of that row
with the time data from line 1048578. It does this even for line 10402254 -
and then you still only got one cell of data from line 10402254 - plus it
takes a couple of minutes to do it.
4. The text import will work at all only as long as you have the
import range set up on a separate sheet. If you attempt to set up two import
data ranges next to one another on the same sheet, then Excel will quit
"unexpectedly" when you try to update the data connection - i.e. it crashes.
I found similar stupid behaviour when you have formulae to the right of the
import range - it either crashes or takes hours (instead of a few seconds) to
complete the operation.
The bad behaviour beyond text file line 1048576 is obviously not what the
Excel programmers intended. The program displays the following friendly
advice when you start your test import on line 1 of a really large text file:
"The text file contains more data then will fit on a single worksheet. To
continue and import as much of the data as will fit, click OK. You can then
import the rest of the data by repeating the import operation on another
worksheet and using the Text Import Wizard to exclude data already imported"
The fact is that you can't.
The indications are that the "Text Import Wizard" is built with text file
line addressing that only uses 20 bits - more like a "Text Import Doofus."
Too bad too, since my calculations indicate that if it worked properly, I
would be able to parse a 10,000,000 line text file with two fixed width
columns into Excel worksheet cells in about 2.5 minutes rather than the ~75
minutes the stripped down vba script I had to write takes to do it line by
line; this was the "solution" offered by a guy in Microsoft Premiere tech
support. I also noted that both vba and the "wizard" use only one processor.
Is there prospect that Microsoft will wise up this Wizard and make it work
correctly??