dealing forcefully with Unicode and non-Unicode characters

D

da9ve

Two related questions:

1) I have a source document from China, where the author has used a
"registered" symbol. They tell me the symbol was created by Insert | Symbol
| (normal text) and selecting and inserting the 'registered' symbol from
there. This should, as far as I can tell, enter the Unicode 00AE ® symbol.
BUT, I have been unable to toggle the Unicode character code display for that
character in the source document, or when I copy it to another document. Is
there any more forceful way to make Word show me the Unicode code, other than
selecting the character and using Alt+x? or, alternately, is there any
reason the above method of creating the character would enter a different
Unicode than what the Insert |Symbol dialog indicates? More info: in one
instance of their use of this character, it is in Times New Roman font, and
in another it is in SimSun font. I don't think this should make any
difference.

2) When I type Ctrl+- (Ctrl plus the number-row hyphen/minus), I get a
character that displays on-screen to look like the logical NOT symbol 00AC,
but it similarly doesn't toggle to a Unicode character when I try to Alt+x
it. What exactly is this Ctrl-hyphen character? When I open a document
containing it in Schlafender Hase's Text Verification Tool (TVT v5.0 beta),
it doesn't even 'see' the character, thus it seems it's not a Unicode
character at all.
 
D

da9ve

Follow-up on my own question 1) below. I happened to copy-paste the
misbehaving 'registered' symbol into a Find What box in Word, and it
displayed there as an odd cursive-D-looking glyph. I copied and pasted
*that* back into the document text (where it displayed only as a box),
toggled its Unicode, and it comes up as Unicode F0D2, which is a Private Use
Area code. As far as I can tell, somewhere in the translation from my
Chinese-sourced document - created in Word 2003, from an English keyboard on
an HP EliteBook 6930p laptop, to when I opened the document in Word
2003/Office Pro 2003 SP3, that 'registered' character created as described
below got substituted by F0D2. Why would this happen? Or, is there a
substitution happening just when I copy-paste the character into the Find
What box?

da9ve
 
G

grammatim

Two related questions:

1) I have a source document from China, where the author has used a
"registered" symbol.  They tell me the symbol was created by Insert | Symbol
| (normal text) and selecting and inserting the 'registered' symbol from
there.  This should, as far as I can tell, enter the Unicode 00AE ® symbol.  
BUT, I have been unable to toggle the Unicode character code display for that
character in the source document, or when I copy it to another document.  Is
there any more forceful way to make Word show me the Unicode code, other than
selecting the character and using Alt+x?  or, alternately, is there any
reason the above method of creating the character would enter a different
Unicode than what the Insert |Symbol dialog indicates?  More info: in one
instance of their use of this character, it is in Times New Roman font, and
in another it is in SimSun font.  I don't think this should make any
difference.

There's an R in a circle at Unicode 24C7, which if you're in a Chinese
environment might be what shows up. It might even be a 20DD Enclosing
Circle with an R before it.
2) When I type Ctrl+- (Ctrl plus the number-row hyphen/minus), I get a
character that displays on-screen to look like the logical NOT symbol 00AC,
but it similarly doesn't toggle to a Unicode character when I try to Alt+x
it.  What exactly is this Ctrl-hyphen character?  When I open a document
containing it in Schlafender Hase's Text Verification Tool (TVT v5.0 beta),
it doesn't even 'see' the character, thus it seems it's not a Unicode
character at all.

Ctrl-hyphen isn't a "character," it's the instruction to Word to
insert an optional hyphen. The NOT symbol is only visible when you
have Show Non-Printing Characters turned on (Ctrl-Shift-8).
 
D

da9ve

grammatim said:
There's an R in a circle at Unicode 24C7, which if you're in a Chinese
environment might be what shows up. It might even be a 20DD Enclosing
Circle with an R before it.

Well, it turns out that the character apparently IS the 00AE registered sign
- at least, that's what TVT tells me it is (which I hadn't tried yet when I
first posted) - but I just can't convince Word to toggle its Unicode value.
So, the question becomes simpler: Is there any other way more powerful than
the Ctrl+x to force Word to toggle the codes?

Ctrl-hyphen isn't a "character," it's the instruction to Word to
insert an optional hyphen. The NOT symbol is only visible when you
have Show Non-Printing Characters turned on (Ctrl-Shift-8).

Yeah, I'd figured out but forgot to include the fact that I knew it was a
discretionary hyphen. It was mostly the firm confirmation that it wasn't a
Unicode character that I was hoping to find. So it's apparently a
Word-proprietary version of the "shy" soft hyphen then? Thanks!
 
D

da9ve

To follow up on my own follow-up, I've since confirmed that the substitutions
are happening at the stage of pasting the eccentric character into the Find
What box - I found a few other characters that behave similarly, resulting in
different Private Use range character substitutions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top