Compare strings containing symbols

B

Brett Grant

I am using word 2000.

I have a document that describes a software product. In the document,
for each function, there are a number of tables that deliniate the
local variables, public variables, persistants, etc. The last section
under each function contains some pseudocode. I wrote a parser for
the pseudocode that returns an array of strings that contain all of
the variables in the pseudocode. The variables in the tables are
available to me as an array of ranges. The problem is that these
variables may contain greek symbols inseted from the Insert|Symbol
dialog box, such as gamma, delta, omega, etc.

Well, I did a search and found all of the problems associated with
inserted symbols. I was playing around in the immediate window and
found that if I do this:

print strcomp(String_array(1),range_array(1).text,vbTextCompare)

returns +/-1

This seems odd to me since if I print each of strcomp inputs I get
something like this:

print String_array(1)
double_?e
print range_array(1).text
double_?e

These are exactly the same, so strcomp should return a 0, but it
doesn't.
On the off chance that the characters may be different I checked:

print asc(mid(String_array(1),8,1))
63
print asc(range_array(1).characters(8).text)
63

which tells me that they should be the same. I then tried to strcomp
just the two characters and it returns +/- 1 and I didn't expect that.

So in desparation, I selected the pseudocode variable and tried this:

print strcomp(range_array(1).text,selection.range.text,vbcomparetext)

and it returns a 0!

I don't understand why this works, whereas all of the previous tries
do not. The obvious solution is to rewrite my pseudocode parser to
return the ranges, but I am just trying to understand why one method
works, while the other does not. Are there some undocumented
properties of strcomp that allow it to figure out the mystery
character? Anybody else have any experience with this? I could find
anything with a google search.

Thanks,
Brett
 
B

Brett Grant

Klaus Linke said:
Hi Brett,


63 is the code of a "?". You should use AscW instead of Asc, because Word
uses Unicode (... there are no greek letters in ASCII/ANSI, so you may get
question marks for them).

If you get negative codes with AscW, post back. In that case, the "Symbol"
font has been used, which adds a whole new set of problems.
In case you get 40 for the code, you are really in for trouble. Old
versions of Word used to hide symbols behind this code so you couldn't
accidentally change the symbol. Finding out the real code is hard and slow
in that case.

Regards,
Klaus

Well, I know that they are from the symbol font. The 63rd symbol in
the font is omega, which was the correct symbol. Perhaps that is just
coincident.

Anyway, a string comp on the text from the ranges works just fine,
even if the fonts are mixed.

Klaus - to answer your question, if I use ascw, a -3977 is returned.
What new set of problems does this introduce?

Thanks,
Brett
 
K

Klaus Linke

Hi Brett,
print asc(range_array(1).characters(8).text)
63
63 is the code of a "?". You should use AscW instead of Asc [...]
Well, I know that they are from the symbol font. The 63rd
symbol in the font is omega, which was the correct symbol.
Perhaps that is just coincident.

The capital omega in the Symbol font has code 87, the small one code 119.
Code 63 = "?" in the Symbol font is just "?".
Anyway, a string comp on the text from the ranges works just fine,
even if the fonts are mixed.
Klaus - to answer your question, if I use ascw, a -3977 is returned.
What new set of problems does this introduce?

Word uses Unicode. It can't figure out that character 119 from the "Symbol"
font is a small omega (which would be code U+03C9 = 969 in Unicode).

And it can't use the code 119 either because 119 in Unicode is a "w".

So it uses a special code page for fonts like "Symbol", starting at &HF000.
&HF000 converted to a signed integer is -4096.
-4096+119=-3977, so that is the code returned by AscW.

To your problems with StrComp: Neither the VBA editor nor MessageBoxes can
properly deal with Unicode, and much less with the code page starting at
&HF000 (which is reserved for "private use"). So both will display a
question mark.

In your original post you said
print String_array(1)
double_?e
print range_array(1).text
double_?e

You can either insert the Unicode string into a Word document to see what
character the "?" really is (Selection.InsertAfter String_array(1) ...), or
analyze the "?" with AscW.
Since it is the 8th character in the string:
? AscW(mid(String_array(1),8,1))
? AscW(mid(range_array(1),8,1))

If StrComp gave you +1/-1 (both strings are not identical), you'll probably
get different results for this character.

(BTW, perhaps you should better use vbBinaryCompare instead of
vbTextCompare)

Regards,
Klaus
 
B

Brett Grant

Klaus Linke said:
Hi Brett,
print asc(range_array(1).characters(8).text)
63
63 is the code of a "?". You should use AscW instead of Asc [...]
Well, I know that they are from the symbol font. The 63rd
symbol in the font is omega, which was the correct symbol.
Perhaps that is just coincident.

The capital omega in the Symbol font has code 87, the small one code 119.
Code 63 = "?" in the Symbol font is just "?".
Anyway, a string comp on the text from the ranges works just fine,
even if the fonts are mixed.
Klaus - to answer your question, if I use ascw, a -3977 is returned.
What new set of problems does this introduce?

Word uses Unicode. It can't figure out that character 119 from the "Symbol"
font is a small omega (which would be code U+03C9 = 969 in Unicode).

And it can't use the code 119 either because 119 in Unicode is a "w".

So it uses a special code page for fonts like "Symbol", starting at &HF000.
&HF000 converted to a signed integer is -4096.
-4096+119=-3977, so that is the code returned by AscW.

To your problems with StrComp: Neither the VBA editor nor MessageBoxes can
properly deal with Unicode, and much less with the code page starting at
&HF000 (which is reserved for "private use"). So both will display a
question mark.

In your original post you said
print String_array(1)
double_?e
print range_array(1).text
double_?e

You can either insert the Unicode string into a Word document to see what
character the "?" really is (Selection.InsertAfter String_array(1) ...), or
analyze the "?" with AscW.
Since it is the 8th character in the string:
? AscW(mid(String_array(1),8,1))
? AscW(mid(range_array(1),8,1))

If StrComp gave you +1/-1 (both strings are not identical), you'll probably
get different results for this character.

(BTW, perhaps you should better use vbBinaryCompare instead of
vbTextCompare)

Regards,
Klaus

Well, what I ended up doing is this:

strcomp(trim(range1.text),trim(range2.text),vbtextcompare)

and it works just fine. The trim is necessary because every once in a
while, an extra space will append itself to the variable name, but
that is a whole other problem and this was easier.

However, if I have a string variable:
string1 = trim(range1.text)
string2 = trim(range2.text)

The following do not work, ie. they return a zero:
strcomp(string1,string2)
strcomp(string1,trim(range2.text))
strcomp(trim(range1.text),string2)

Go Figure.

Thanks for the help,
Brett

ps - what part of Germany are you in?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top