\fcharset vs \cpg

J

Jialiang Ge [MSFT]

Hello Dave,

The difference between \fcharset and \cpg is:
\fcharset is defined for character set. A character set is all the
characters in a font. It might be a few hundred or many thousands.
\cpg is defined for code page. A codepage is the set of characters (or a
subset in a large font) that can be typed directly from the keyboard for a
particular keyboard layout.
For more information about the difference between character set and
codepage, please see:
http://www.microsoft.com/typography/unicode/cscp.htm
http://en.wikipedia.org/wiki/Character_set
http://en.wikipedia.org/wiki/Code_page

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
D

David Thielen

My point is they seem to map 1:1.

A codepage basically maps SBCS and/or DBCS characters to their unicode
equivilent. (Yes there are other kinds of codepages but for RTF & Windows in
general, that is what they are used for.) So there is a codepage for Western
Europe, one for Greek, one for Japanese, etc.

Now a charset as used in RTF & Windows appears to be a subset of unicode,
again for a given alphabet such as western europe, greek, etc.

I don't think a font ties directly to this because a font generally has
multiple charsets in it, and can be missing just 1 or 2 characters from a
charset (older fonts don't have the Euro character).

Anyways, my question is don't they always map 1:1? For example, doesn't
charset 161 and codepage 1253 always go together? I don't see how you can
have one without the other?

This is what I am asking about - why are both specified if they are always
tied 1:1?

--
thanks - dave
david_at_windward_dot_net
http://www.windwardreports.com

Cubicle Wars - http://www.windwardreports.com/film.htm
 
J

Jialiang Ge [MSFT]

Hello Dave,

Codepage and charset do not have 1:1 map relation:

codepage 437, 850 and 1252 map to ANSI_CHARSET, and all five of those
Arabic code pages (708, 709, 710, 711, 720) kind of map to charset 1256.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
J

Jialiang Ge [MSFT]

Hello Dave,

Sorry for my delayed response. I consulted with the product team to get a
more accurate answer for you.

According to their response, it is actually the other way around: if you
know the charset, you know the codepage, because the codepages include all
the charsets. There's a little twist: SYMBOL_CHARSET corresponds to the
codepage 42, at least here in Office : Fonts like Symbol and Wingdings have
SYMBOL_CHARSET.

They told me that ANSI_CHARSET is only 1252. If you want 437 or 850, we
should use \cpg437 or \cpg850, respectively. But hopefully you'll forget
about using codepages in RTF except for those with standard charsets.

If \fcharsetN appears in the \fonttbl entry, RichEdit's RTF reader uses the
corresponding codepage for conversion purposes. If a \cpgN appears, that N
is used for conversion purposes. RichEdit doesn't ever write \cpgN, since
noncharset text runs can be written using Unicode control words \uN. For
example, for Shift-JIS, \fcharset128 (SHIFTJIS_CHARSET) is all that's
needed for reading and writing RTF. The generated rtf will not write the
codepage 932. The next version of the RTF spec will make this clearer.

Regard,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top