Unicode Word 2004 and Windows.

J

Jan Aukes

I bought Office 2004 recently. So I liked to try out the new Unicode
possibilities.
As a test I sent a document with several weird characters to the
mailaddress of my work. Word 2003 for Windows opened this document
perfectly.
But...: I added some more characters to the document and sent it back to
my mac.
When I opened the document again with Word 2004 nearly all the
characters were gone! There where a lot of squares.
So what went wrong?
 
J

John McGhie

Hi Jan:

You added the characters from a font you do not have installed on the Mac.

When you insert a character in a Unicode document, you actually insert a
number: the number of the character. Later, the system finds the outline
for the character when it needs to print it (or display it, the screen to
the computer is simply a low-resolution "printer").

On Windows, the system looks first in the font you are using for the
character. If that font does not contain the character, Windows looks back
through its installed fonts until it finds a font which does have the
character.

If Microsoft Office has been installed on Windows, it will by default have
installed Arial Unicode MS. This font contains all the characters there are
in the Unicode 3.2 standard, which means the search will always succeed: the
character will always be found.

Microsoft wanted to ship Arial Unicode MS with Word 2004, but they struck
problems and were unable to get it working on the Mac. Since there is no
Mac equivalent, you get this problem. It is possible to insert characters
in Windows that no common font on the Mac contains. If you do, you get
square boxes...

Eventually, font manufacturers will enhance their fonts for the Mac so that
this condition becomes less frequent. But currently, most Mac fonts contain
only 260 characters, while most PC fonts contain 512.

The exceptions are Times New Roman, and Arial. The Mac versions of those
are both "expanded" fonts with 512 characters. However, Arial Unicode MS
contains 32,000 characters, so as you can see we are still a way short of
having everything.

Now, since there is a PC somewhere near you, I tell you this: if the Arial
Unicode MS font were to somehow find its way from that PC to your Mac, it
would "work". Of course, that would be highly illegal and a breach of
copyright and quite naughty and you would be not a gentleman if you were to
do such a dastardly thing. So I am sure you would not even think of it.

But if you "were" to think of it, you should also be aware that the font
does not look very nice, it's huge, takes up a lot of memory, and apparently
some characters in it could conceivably cause crashes. It's an OpenType
font that contains macros to draw the characters. Some of those macros can
apparently cause crashes. I've never seen it, but apparently it can happen.

In case anyone is about to call the Police, this Mac has a copy of Virtual
PC which just happens to contain a legal copy of Arial Unicode MS. I've had
no trouble with it, but I very very rarely use it.

Cheers

I bought Office 2004 recently. So I liked to try out the new Unicode
possibilities.
As a test I sent a document with several weird characters to the
mailaddress of my work. Word 2003 for Windows opened this document
perfectly.
But...: I added some more characters to the document and sent it back to
my mac.
When I opened the document again with Word 2004 nearly all the
characters were gone! There where a lot of squares.
So what went wrong?

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 
P

Paul Berkowitz

The exceptions are Times New Roman, and Arial. The Mac versions of those
are both "expanded" fonts with 512 characters. However, Arial Unicode MS
contains 32,000 characters, so as you can see we are still a way short of
having everything.

Actually, the Mac MS Unicode fonts are Times New Roman, Verdana, Trebuchet
MS, and one of the Asian fonts (I always forget which - MS PMincho or MS
PGothic, I think). The assertion that these fonts have "512 characters" is
absurd, John. They have thousands upon thousands of characters, but not the
32,000 of Arial Unicode MS. (Lucida Grande, an Apple font, has more than
they do, still not the full 32,000 in Panther, but this font is unknown on
Windows.)

The best way to see the range of characters in each font is by checking
"Character Palette" in System Preferences/International/Input Mode, then
selecting Character Palette in the Input menu (the flag menu). Browse though
all the Unicode blocks selecting characters as you go and checking the fonts
in the lower section.

--
Paul Berkowitz
MVP MacOffice
Entourage FAQ Page: <http://www.entourage.mvps.org/faq/index.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>

Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.

PLEASE always state which version of Microsoft Office you are using -
**2004**, X or 2001. It's often impossible to answer your questions
otherwise.
 
P

Paul Berkowitz

Actually, the Mac MS Unicode fonts are Times New Roman, Verdana, Trebuchet MS,
and one of the Asian fonts (I always forget which - MS PMincho or MS PGothic,
I think). The assertion that these fonts have "512 characters" is absurd,
John. They have thousands upon thousands of characters, but not the 32,000 of
Arial Unicode MS. (Lucida Grande, an Apple font, has more than they do, still
not the full 32,000 in Panther, but this font is unknown on Windows.)


Well, I was taking your word for the "32,000 characters". I've done a bit of
research. Alan Wood (<http://www.alanwood.net/unicode/fonts.html>) says that
Arial Unicode MS (as of Word 2002, Office XP) has all the characters of the
Unicode 2.0 standard - 51,180 characters (
http://www.alanwood.net/unicode/fonts.html#arialunicodems). In the
meantime, Unicode has now reached version 4.0.1 and now has 96,447
characters <http://www.unicode.org/standard/principles.html> .

I don't know how many characters Times New Roman, Verdana and Trebuchet MS
Unicode versions have. They may be confined to Western European characters
with some variants - I think you must have been a lot closer with your "512"
than I expected. Apologies. TNR and Verdana are hardly to be found even in
the Latin Extended-A set of the Character Palette - Latin characters with
accents and diacritics of Central European languages. Trebuchet MS does
appear there sporadically, and MS PMincho and MS PGothic very prevalent
there and in many other character sets. (Lucida Grande is everywhere, and
even several other Apple fonts are very prevalent, but that doesn't help for
Word where you'd want only MS fonts that will be found also in Word
Windows.)

--
Paul Berkowitz
MVP MacOffice
Entourage FAQ Page: <http://www.entourage.mvps.org/faq/index.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>

Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.

PLEASE always state which version of Microsoft Office you are using -
**2004**, X or 2001. It's often impossible to answer your questions
otherwise.
 
J

John McGhie

Hi Paul:

Yeah, it's an absolute problem trying to work out how many characters each
font has. The core Windows fonts support Windows Glyph List 4, which
potentially contains 650 characters.

As explained here:
http://www.microsoft.com/typography/unicode/cscp.htm

Regrettably, not all fonts contain the whole list. Alan Wood's information
is probably the most accurate and up-to-date. As you point out, Alan says
Arial Unicode MS contains 51,180 characters. I didn't think it went above
32,000, but Alan is much more likely to be right :)

Cheers

Well, I was taking your word for the "32,000 characters". I've done a bit of
research. Alan Wood (<http://www.alanwood.net/unicode/fonts.html>) says that
Arial Unicode MS (as of Word 2002, Office XP) has all the characters of the
Unicode 2.0 standard - 51,180 characters (
http://www.alanwood.net/unicode/fonts.html#arialunicodems). In the meantime,
Unicode has now reached version 4.0.1 and now has 96,447 characters
<http://www.unicode.org/standard/principles.html> .

I don't know how many characters Times New Roman, Verdana and Trebuchet MS
Unicode versions have. They may be confined to Western European characters
with some variants - I think you must have been a lot closer with your "512"
than I expected. Apologies. TNR and Verdana are hardly to be found even in the
Latin Extended-A set of the Character Palette - Latin characters with accents
and diacritics of Central European languages. Trebuchet MS does appear there
sporadically, and MS PMincho and MS PGothic very prevalent there and in many
other character sets. (Lucida Grande is everywhere, and even several other
Apple fonts are very prevalent, but that doesn't help for Word where you'd
want only MS fonts that will be found also in Word Windows.)


--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 
J

Jan Aukes

Hi john,

Thanks for your extensive explanation. So I understand that it is
important which font I choose. Now I learned to look closely to hr font
family when I insert strange characters.
I use PopChar for inserting.
My default font is Times New Roman. But that is the one with the least
possibilities. So I understand that Times New Roman on the Windows
computer has more.
I understand now also that PopChar uses Lucida Grande when it shows
green characters and I see on my mac that is sometimes turns into New
York as the alternative font.
So when the standard font in my Windows computer is Times New Roman the
troubles begin.
Do you have a suggestion for a common font?
 
J

John McGhie

Hi Jan:

I would use Times New Roman, but get a copy that supports the WGL4 character
set (the one installed by Word 2004 will be fine).

Cheers


Hi john,

Thanks for your extensive explanation. So I understand that it is
important which font I choose. Now I learned to look closely to hr font
family when I insert strange characters.
I use PopChar for inserting.
My default font is Times New Roman. But that is the one with the least
possibilities. So I understand that Times New Roman on the Windows
computer has more.
I understand now also that PopChar uses Lucida Grande when it shows
green characters and I see on my mac that is sometimes turns into New
York as the alternative font.
So when the standard font in my Windows computer is Times New Roman the
troubles begin.
Do you have a suggestion for a common font?

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 
A

Andreas Prilop

Regrettably, not all fonts contain the whole list. Alan Wood's information
is probably the most accurate and up-to-date. As you point out, Alan says
Arial Unicode MS contains 51,180 characters.

No, he doesn't. Alan Wood writes 51 180 glyphs, which is something different.
For example, one Arabic character usually has four different glyphs.
 
P

Paul Berkowitz

No, he doesn't. Alan Wood writes 51 180 glyphs, which is something different.
For example, one Arabic character usually has four different glyphs.

Right. In this case Alan doesn't say how many characters. Perhaps John's
"32,000" was a good approximation. I can't find an authoritative total
anywhere. Glyph total is probably more useful anyway for fonts that include
multi-glyph characters.

Alan does mention some Mac Office 2004 fonts specifically
(<http://www.alanwood.net/unicode/fonts_macosx.html>):

Times New Roman 3.05 in Office 2004 has 1176 glyphs (v3.00 used in Windows
XP SP2 has 1419 characters (1674 glyphs). )

Arial 3.05 in Office 2004 has 1186 glyphs. (v 3.00 in Windows XP SP2 has
1419 characters (1674 glyphs).)

Verdana 2.45 in Office 2004 has 686 glyphs (he says "glyphs" - maybe he
means "characters"). (v2.43 used in Windows XP SP2 has 680 characters (893
glyphs).)

Trebuchet MS 1.26 in Office 2004 has 583 glyphs.
(v1.23 in Windows XP has 577 characters (576 glyphs). )

MS PGothic 2.52 in Office 2004 has 15,739 characters (22,319 glyphs) v2.30
(Windows XP) has 14,965 characters (20,458 glyphs).

MS Gothic for 2004 same.

MS PMincho 2.52 in Office 2004 has 15,739 characters (19,350 glyphs). v 2.30
(Windows XP) has 14,965 characters (17,807 glyphs).

MS Mincho for 2004 same.


--
Paul Berkowitz
MVP MacOffice
Entourage FAQ Page: <http://www.entourage.mvps.org/faq/index.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>

Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.

PLEASE always state which version of Microsoft Office you are using -
**2004**, X or 2001. It's often impossible to answer your questions
otherwise.
 
J

John McGhie

Oh Damn! Andreas, you are quite correct, and I was being soooo careful too
:)

Alan says it "contains all characters from the Unicode 2.0 standard" and
that it contains "51,180 glyphs".

You are, of course quite correct. A glyph is a "shape", a character calls
one or more "shapes" to make its outline.

I "thought" Arial Unicode MS actually implemented the Unicode 3.2 standard,
but I don't know where I got that information, and now I can't find it. So
I would go with Alan's information.

People interested in the definitive source should visit Alan's website:
http://www.alanwood.net/

Cheers


No, he doesn't. Alan Wood writes 51 180 glyphs, which is something different.
For example, one Arabic character usually has four different glyphs.

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top