Importing Text files w/o losing special characters

T

T_o_n_y

I need to import text files into Excel without losing special characters.
I've tried several methods, but each time Excel imports in the file, ignoring
those characters. The following is an example line, but what you can't see
are the 6 special characters which appear between the $$158 and the 1 8!

$$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740

I know they are there, however since I opened the document using Word,
which displays them as a y with 2 dots above them.

My Excel VBA code needs to import these characters so that it doesn't get
lost when extracting the data using MID(,,,) function. The text file were
generated using old FORTRAN programs, and there are thousands of them...my
VBA routines need to access these files in order to modernize our system.

Examples of what I've tried (all of these ignore the y characters)

Workbooks.OpenText Filename:=fname, Origin:=437, _
StartRow:=1, dataType:=xlFixedWidth, FieldInfo:=Array(0, 2)

With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & fname,
Destination:=Cells(2, Col))

Open FName For Input Access Read As #1
While Not EOF(1)
Line Input #1, WholeLine
Cells(RowNdx, ColNdx).Value = WholeLine
RowNdx = RowNdx + 1
Wend
Close #1

I would upload an example file showing the characters if someone tells me
how. I would also tell you what the characters are, again, if someone tells
me how.

Thanks,
Tony
 
T

Tom Ogilvy

put your string in cell A1. Then in B1 or another cell in the first row put
in this formula

=CODE(MID($A$1,ROW(),1))
Assume the above formula is in B1
in C1:
=CHAR(B1)

now select B1:C1 and drag fill down until the formula starts returning
#Value errors.

The only thing between the characters in your post are ascii code 32 which
is a space.

Possibly they didn't get carried forward in the email.
 
N

NickHK

Tony,
Are the columns fixed width, as the parameter dataType:=xlFixedWidth
suggests ?
Are these special characters, >ASCII value 255 ?

As for uploading a samle of the data,
http://savefile.com/

NickHK
 
T

T_o_n_y

Tom,

Thank you for your reply. I followed your procedure but only got four "32"
s in that blank section; that is, there are only 4 spaces there. This
confirms what I've suspected, namely, that Excel is simply not importing
those characters. I've also tried using C. Pearson's Cell View Add-in with
the same result (http://www.cpearson.com/excel/CellView.htm).

As you point out, the characters also get stripped when I cut and paste into
this forum. Therefore, I've emailed you separately the file I referred to as
an attachment (it's a text document called W158.DAT) sent from myother_acct.
If I knew how to post it to this forum, I would.

I appreaciate your help...this is a frustrating problem for me. Is there a
way to import the text file character by character?

-Tony
 
T

T_o_n_y

Thank you, Nick, for showing me how to upload the file. Here is the link for
the file:
http://www.savefile.com/files/150039
To answer your question, the text file data is not fixed with but I've tried
fixed width as well as delimited...nothing works for me...the characters are
always stripped away.
 
T

T_o_n_y

Thank you for showing me how to upload the file.

Here is the link:
http://www.savefile.com/files/150039

To answer your question, the files are not fixed width, although I have
tried that option my attempts to import. No matter what I try, fixed width,
delimited, etc... Excel always seems to strip those characters away and all
I'm left with are the spaces around the special characters...but as I said,
my code needs to count those special characters in order for the string
manipulations to properly gather data.

Thanks again,
Tony
 
T

Tom Ogilvy

put this in a workbook. Change the path to point to your file:

Sub ReadStraightTextFile()
Dim strTest As String
Dim bytArray() As Byte
Dim intcount As Integer
Dim col As Long
Open "E:\Data1\W158.DAT" For Input As #1
col = 0
Do While Not EOF(1)
Line Input #1, strTest
col = col + 1
bytArray = strTest
i = 0
For intcount = LBound(bytArray) To UBound(bytArray)
i = i + 1
Cells(i, col) = bytArray(intcount) & " = " &
Chr(bytArray(intcount))
Next

Loop 'Close the file

Close #1
End Sub


Have blank sheet as the activesheet. Run the macro.

It appears to me that the file is UNICODE. unlike an ascii file that has
one byte per character, a unicode file has two bytes per character.

there are 8 bits to a byte, so an ascii file can have 8^2 = 256
different/unique character codes. In a unicode file, 2 bytes is 16 bits,
so 2^16 = 65536 possible unique characters.

I didn't see any actual characters that couldn't be represented by Ascii, so
you could read every Odd character .


It appears that opening it in Excel automatically converts it to Ascii, so
you haven't lost any information, but if you want to edit it and write it
back out, you would need to save it as Unicode Text. I know that is an
option in at least xl2000 and I assume later.
 
N

NickHK

Tony,
Looking the first 16 bytes of your file in a Hex editor, you have:
24 24 31 35 38 20 20 00 00 00 00 00 00 20 20 31

The "20"=decimal 32, which is a normal space.
The "00" are = vbNullChar and are not printable, I get a "" in Word. Not
sure where your "y with 2 dots above them" came from.
As such, do they actually mean anything ?

Your data seems such a mix of formats, I doubt Excel would be able to make
much sense of it.

You would be better writing your own parsing routine :
<Air Code>
Open "the file" For Input
Do until EndOfFile
LineInput ToAVariable
Call DecideWhatThisLineMeans
'Process the data
Loop
Close file

NickHK
 
T

T_o_n_y

I tried your macro, but unfortunately Excel still did not import the special
characters. Recall that there are 6 special characters between the $$158 and
the 1 8 in the first line of the file:
$$158 1 8 4.50 1.0000 0.8000 3.0010 1.5740

For that section, the output from your macro looked like this:
36 = $
0 =
36 = $
0 =
49 = 1
0 =
53 = 5
0 =
56 = 8
0 =
32 =
0 =
32 =
0 =
32 =
0 =
32 =
0 =
49 = 1
0 =
32 =
0 =
32 =
0 =
56 = 8

In other words, the 6 characters got stripped away again so that all you see
are the 2 spaces which appear on either side of the 6 special characters.

The only way I've found for Excel to even recognize that those characters
exist is to use the "Delimited" option during text import and specify
"spaces" as the delimiting character with the "Treat consecutive delimiters
as one" feature unchecked. Unfortunately, that method of importing would
mean a huge rework of my existing code.

I spent another few hours trying to research the UNICODE possibilty you
mentioned, but still was unable to come up with anything.

At a loss...
-Tony
 
T

T_o_n_y

Art,

I've done as you requested, although the file is also available from the
link I posted earlier, i.e., http://www.savefile.com/files/150039

I've concluded that the special characters are most likely null characters
(i.e., ascii 000), but I'm still not 100% sure. If so, it appears Excel
strips away null characters when importing into cells.

Although it would be best if I could somehow import these characters into an
Excel cell or string variable, a work around for me would be to search for
null characters in multiple text files and replace them with spaces. If
anyone knows how to do this, I'd be interested. I have a program called
"Advanced Search and Replace" with which I've tried to do this, but without
success...again making me wonder if the characters are, in fact, ascii 000.

I appreaciate the continued help.
-Tony
 
T

Tom Ogilvy

Guess it was a waste of time trying to explain it to you. Did you bother to
read it?
 
T

Tom Ogilvy

Art,

open excel

open a new workbook

type a bunch of spaces in the first couple of cells in the first column of
the workbook

Now do Save as and select

Unicode Text (*.txt)

as the file type.

Now close excel. Open the file in word. It if offers to open it as
Unicode, unselect that option and select the option to the left (normal).

when it is open in Word, select the Paragraph symbol so you can see the
character returns. You will see the y with two dots. (so those are the
null characters - the second part/byte of the unicode two byte characters.
 
T

T_o_n_y

Tom,

I'm perplexed at your response because, I'm about as far from ignoring your
posts as possible. Indeed, I generally skip directly to your posts when on
this newsgroup since they are more helpful than anyone's, containing actual
sample code that can be used.

It's just that the output from the macro you sent me led me to the
conclusion that you were mistaken this time. The output shows "0" for every
other cell which is not what I would expect from UNICODE with 2 bytes per
character. Furthermore, rather than revealing the presence of the special
characters, your macro also had them stripped away.

Here's what I mean. The file I've uploaded contains the following in the
first line,

$$158++yyyyyy++1++8++4.50 etc...

I've substituting + for spaces and y for the special characters above. As
you can see there are 2 spaces followed by 6 special characters followed by 2
spaces. The output from your macro completely omits the 6 special
characters, if I'm reading it correctly.

As I wrote, I spent "anoter few hours" researching into UNICODE in order to
investigate the possibility you raised...but nothing I found seemed to
confirm it. In addition, Excel has two different UNICODE types (UTF-8 and
UTF-7) which one can select in the text import wizard. I tried both of them
and neither gave me success in importing the special characters, as judged by
using c pearson's CellView add-in, which allows character by character
visualization of cell contents.

Thank you again for your help,
-Tony
 
T

Tom Ogilvy

One added step. I formatted the cells in Excel as MS Ariel Unicode before
doing the saveas. Don't know if that is necessary or not.
 
T

T_o_n_y

Art H,
In response to your request, I'm posting two screen shots related to the

special characters I'm having trouble inputing into Excel. This one shows

the file in Word while in the mode which "shows" special characters:
http://www.savefile.com/files/154330
This one shows the file while in the mode which "hides" special characters:
http://www.savefile.com/files/154335
I should have mentioned initially that the 6 special characters show as y's

with 2 dots in the former mode and rectangles in the later.

-Tony
 
N

NickHK

Tony,
The "special characters" I see in your uploaded file are Asc(0). It is not a
Unicode file.

As for the "0" for every other cell, that is expected if you have all ANSI
text stored in a UNICODE format. The lower byte will always be 0 as no
values exceed decimal 255 or FF hex.

So do you have a Unicode file or not ?

NickHK
 
T

T_o_n_y

Tom,

I did what you suggested, typing spaces into Excel cells, saving as UNICODE,
and then opening it in Word.

I did the same thing with other samples of number and text.

Since null characters are bytes full of zeros, it seems to me that one would
inevitably see the null

characters THROUGHOUT any file that was saved as unicode (since low code
values will always have zeros in

the front). By contrast, the file that I've uploaded contains the null
character symbols in only one

location. Unless it is possible for a file to be formatted as both
non-unicode and unicode (I'm not even

sure what that would mean), I don't see how this could be a unicode file.

I admit that my grasp of this is incomplete at best, but it seems most
likely to me that what is going on

here is that null ascii characters were at one time inserted into these text
files and now, for some

reason, Excel strips them away when importing. If Excel didn't strip them
away, I would have the problem, and it's remarkable to me that you can't even
import nulls into a string.

I also attempted to insert a null character into an Excel cell using the
INSERT-->Symbol feature, but was unable to access null character. It seems
Excel refuses to have this character.


-Tony
 
T

T_o_n_y

Nick,

What you've written makes sense to me, and that is why I've had trouble
concluding that it is a unicode file. As I wrote in an earlier post, I'm
leaning toward it being a text file that has 6 Ascii(0) characters inserted
into it.

The question remains: is there a way to have Excel import these characters
rather than completely ignoring them?

-Tony
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top