Non-standard characters on Web

D

Dan

I just posted a fairly massive document to the Web. The results were
disappointing. I saved everything as Web documents using Word X. I
used quite a few non-standard characters -- em dashes, curly quotes,
curly apostrophes, occasional accent marks over e's, etc. In opening
them in Firefox from my hard drive, everything looked fine. But once I
uploaded the files and opened them in Firefox on the Web, everything
went flooey, and I had to dive in and strip out every non-standard
character I had. Does anyone know what happened, and what I can do
about it?

Thank you,
DK
 
C

Corentin Cras-Méneur

Dan said:
I just posted a fairly massive document to the Web. The results were
disappointing. I saved everything as Web documents using Word X. I
used quite a few non-standard characters -- em dashes, curly quotes,

It is an issue indeed. The option to export as an HTML document from
WOrd is less than perfect in many cases and I've seen problems like that
in the past. The problem lies with the encoding of the HTML file.

I tend to re-open the files in BBEdit to correct the headers and often
converting the special characters to proper html entities (instead of
relying on some variation fo UTF to encode them).


Corentin
 
D

Dan

It is an issue indeed. The option to export as an HTML document from
WOrd is less than perfect in many cases and I've seen problems like that
in the past. The problem lies with the encoding of the HTML file.

I tend to re-open the files in BBEdit to correct the headers and often
converting the special characters to proper html entities (instead of
relying on some variation fo UTF to encode them).

Corentin

--
--- Mac:MS MVP (Francophone) http://www.cortig.net/wordpress/---
http://www.mvps.org - http://mvp.support.microsoft.com
MVPs are not MS employees - Les MVP ne travaillent pas pour MS
Remove "NoSpam" to e-mail me - Retirez "NoSpam" pour m'écrire

But I don't think that explains why the Word-generated HTML files
looked fine in Firefox when I opened them on my hard drive, but were
messed up when I opened them on the Web. Does it?
 
C

Corentin Cras-Méneur

Dan said:
But I don't think that explains why the Word-generated HTML files
looked fine in Firefox when I opened them on my hard drive, but were
messed up when I opened them on the Web. Does it?

Well if there is no encoding information on the header, the browser will
look at the default encoding for the file and it might be displayed
correctly from your Mac.
When you upload it though, this file encoding can get converted to a
default for the server (usually NOT UTF-8). That's why it's important to
have a proper header decclaration in the document itself since the file
encoding is often unreliable.

To test the hypothesis, you could re-download the HTML file from the
server and check how it displays from your Mac.
You could even open it in a text editor and check how the extended
characters look like. I would suspect that they are all corrupted and
won't display properly,

Corentin
 
L

little_creature

Well if there is no encoding information on the header, the browser will
look at the default encoding for the file and it might be displayed
correctly from your Mac.
When you upload it though, this file encoding can get converted to a
default for the server (usually NOT UTF-8). That's why it's important to
have a proper header decclaration in the document itself since the file
encoding is often unreliable.

To test the hypothesis, you could re-download the HTML file from the
server and check how it displays from your Mac.
You could even open it in a text editor and check how the extended
characters look like. I would suspect that they are all corrupted and
won't display properly,

Corentin

--
--- Mac:MS MVP (Francophone) http://www.cortig.net/wordpress/---
http://www.mvps.org - http://mvp.support.microsoft.com
MVPs are not MS employees - Les MVP ne travaillent pas pour MS
Remove "NoSpam" to e-mail me - Retirez "NoSpam" pour m'écrire

Hiya,
I would recommend you to learn HTML. The HTML itself it's really very
easy. Word puts a lot of mess when it generates the HTML files. It can
result at files 10x times greater. The encoding sounds reasonable,
particularly if the sever will be PC-based.
As a standard I use ISO 8859-2 encoding (as I create the files on PC
with central european language). All special character I need to use I
use the alternative character such as:
non -breakable space  
λ
I have no problems read my files on PC/Mac.
ndash &#150
 
L

little_creature

Hiya,
I would recommend you to learn HTML. The HTML itself it's really very
easy. Word puts a lot of mess when it generates the HTML files. It can
result at files 10x times greater. The encoding sounds reasonable,
particularly if the sever will be PC-based.
As a standard I use ISO 8859-2 encoding (as I create the files on PC
with central european language). All special character I need to use I
use the alternative character such as:
non -breakable space  
λ
I have no problems read my files on PC/Mac.
ndash &#150

Ok, as far as I can see in this web access it was not translated then
it should have been:
on -breakable space   [ampersand followed by nbsp and semicolon
with no spaces]
λ
ndash &#150
 
C

Corentin Cras-Méneur

little_creature said:
Hiya,
I would recommend you to learn HTML. The HTML itself it's really very
easy. Word puts a lot of mess when it generates the HTML files. It can
result at files 10x times greater. The encoding sounds reasonable,
particularly if the sever will be PC-based.

I couldn't agree more. Most of the time you'll be better off this way,
You could also use simple apps to create the HTML for you (like N|Vu,
free):
http://www.nvu.com/index.php


Corentin
 
P

Phillip Jones

Yes find some Simple inexpensive HTML editor if your just learning the
ropes.

However; after you have got your teeth ground to fine edge, and wish to
do some serious website design. My suggestion for the best "big Gun" is
DreamWeaver.

It doesn't come cheap. has a Great What you see is what you Get editing
mode. And you can also have it proof the code to see if its correct.

It deals with
HTML
XML
PHP
Javascript and other type files.

And <BIG GRIN HERE> it has one indispensable feature in tools menu; "Fix
Word HTML". Also another is Fix HTML (his fixes any HTML mistakes
whether generated by MS or not.
I couldn't agree more. Most of the time you'll be better off this way,
You could also use simple apps to create the HTML for you (like N|Vu,
free):
http://www.nvu.com/index.php


Corentin

--
------------------------------------------------------------------------
Phillip M. Jones, CET |LIFE MEMBER: VPEA ETA-I, NESDA, ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112 |[email protected], ICQ11269732, AIM pjonescet
------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://www.kimbanet.com/~pjones/90th_Birthday/index.htm>
<http://www.kimbanet.com/~pjones/Fulcher/default.html>
<http://www.kimbanet.com/~pjones/Harris/default.htm>
<http://www.kimbanet.com/~pjones/Jones/default.htm>

<http://www.vpea.org>
 
C

Corentin Cras-Méneur

[...]
However; after you have got your teeth ground to fine edge, and wish
to do some serious website design. My suggestion for the best "big
Gun" is DreamWeaver.

It doesn't come cheap. has a Great What you see is what you Get
editing mode. And you can also have it proof the code to see if its
correct.

No doubt. DW is the best out there. Unfortunately, as you mentioned,
it doesn't come cheap (unless you have Academic prices).
Corentin
 
P

Phillip Jones

Corentin said:
[...]
However; after you have got your teeth ground to fine edge, and wish
to do some serious website design. My suggestion for the best "big
Gun" is DreamWeaver.

It doesn't come cheap. has a Great What you see is what you Get
editing mode. And you can also have it proof the code to see if its
correct.

No doubt. DW is the best out there. Unfortunately, as you mentioned,
it doesn't come cheap (unless you have Academic prices).
Corentin

That's why I said it wasn't cheap and you should go to it until your
really getting serious about designing websites.

I also noted There probably are inexpensive HTML editors, for those
just getting Started.
--
------------------------------------------------------------------------
Phillip M. Jones, CET |LIFE MEMBER: VPEA ETA-I, NESDA, ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112 |[email protected], ICQ11269732, AIM pjonescet
------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://www.kimbanet.com/~pjones/90th_Birthday/index.htm>
<http://www.kimbanet.com/~pjones/Fulcher/default.html>
<http://www.kimbanet.com/~pjones/Harris/default.htm>
<http://www.kimbanet.com/~pjones/Jones/default.htm>

<http://www.vpea.org>
 
D

Dan

D

Dan

I couldn't agree more. Most of the time you'll be better off this way,
You could also use simple apps to create the HTML for you (like N|Vu,
free):http://www.nvu.com/index.php

Corentin

--
--- Mac:MS MVP (Francophone) http://www.cortig.net/wordpress/---
http://www.mvps.org - http://mvp.support.microsoft.com
MVPs are not MS employees - Les MVP ne travaillent pas pour MS
Remove "NoSpam" to e-mail me - Retirez "NoSpam" pour m'écrire

By the way, I do use SeaMonkey for some stuff, but I get the same
problems with quotation marks, dashes and accents that I get with
Word. So I guess that's non-standard, too. <Sigh.>
 
C

Corentin Cras-Méneur

By the way, I do use SeaMonkey for some stuff, but I get the same
problems with quotation marks, dashes and accents that I get with
Word. So I guess that's non-standard, too. <Sigh.>

Some of them could be.
I use BBEdit to automatically convert them to proper HTML entities
instead. BBEdit is a great tool when you're becoming serious about
HTML (meaning when you want to work with code directly). The free
version (TextWrangler) might have that function as well - I haven't
tested that,


Corentin
 
L

little_creature

Hi guys,
I was also advised to use DW when I have started but after dowloading
the demo it was easier for me to learn HTML -it took me just one
evening - next day I had to move to CSS which I done by trial and
error, but the main thing is that I can read my pages on PC and Mac
and they look more or less similar.

I replace all non-standard character manually. This makes me to read
the text as well and see if I wrote everything correctly. I use
floating text (adjusted according to window size) and I hate one-
character-prepositions or numbers to be let at the end of the line, so
basically I insert non-breaking space:)

For more non standard characters see for example:
http://www.tntluoma.com/sidebars/codes/
or google
 
J

John McGhie

Hi Dan:

Personally, I am unwilling to spend lots of money on Dream Weaver, because I
just don't use it often enough.

What if usually do is save directly out of Word, but I set the encoding to
UTF-8. That will support almost any character in the known universe :)

Go to File>Save as in Word, (or File>Save as Web Page...) and make sure you
click the Web Options button on the dialog that appears.

On the Encoding tab, set the encoding to Unicode (UTF-8).

Word will then write an "Encoding" line into the top of the HTML, which will
enable any browser to correctly recognise the characters in the file.

You can achieve the same thing with various other encoding, but all of them
have some disadvantage. If you use a Macintosh encoding, Windows browsers
will play up. If you use a Windows encoding, Mac browsers may struggle. If
you use any of the others, you may get variable results.

Unicode UTF-8 will work properly in all but a tiny fraction of browsers on
very old computers :)

Now: just for completeness, I recommend that you avoid "Unicode" and
"Unicode (Big endian)". The former uses 16-bit characters for everything,
which may keep the purists happy, but it results in files almost exactly
twice the size. The latter uses a reversed byte-order, which is technically
OK but may lead to spectacular failures with some of the less full-featured
browsers.

Cheers


By the way, I do use SeaMonkey for some stuff, but I get the same
problems with quotation marks, dashes and accents that I get with
Word. So I guess that's non-standard, too. <Sigh.>

--
Don't wait for your answer, click here: http://www.word.mvps.org/

Please reply in the group. Please do NOT email me unless I ask you to.

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
http://jgmcghie.fastmail.com.au/
Sydney, Australia. S33°53'34.20 E151°14'54.50
+61 4 1209 1410, mailto:[email protected]
 
C

Corentin Cras-Méneur

[...]
What if usually do is save directly out of Word, but I set the
encoding to UTF-8. That will support almost any character in the
known universe :)

(that would be UTF-16 John ;-) )


Corentin
 
P

Phillip Jones

I have to maintain a website for a service Association (Virginia
Professional Electronics Association) and there is not two weeks that
goes by I do not have to tweak something on the site. I even created a
html based bylaws page which has a clickable contents.

Just day before yesterday I had to create a photo album of 165 pictures
taken at the convention. Took DreamWeaver and Fireworks all of 30
seconds to generate all the html and Javascript code as well as create
thumbnails, took about another 5 seconds to update the index page with a
Link. and It took Interarchy on the poor 768K DSL Connection about 1/2
hour to upload everything then another 1/2 hour to do a Mirror download
backup.

John said:
Hi Dan:

Personally, I am unwilling to spend lots of money on Dream Weaver, because I
just don't use it often enough.

What if usually do is save directly out of Word, but I set the encoding to
UTF-8. That will support almost any character in the known universe :)

Go to File>Save as in Word, (or File>Save as Web Page...) and make sure you
click the Web Options button on the dialog that appears.

On the Encoding tab, set the encoding to Unicode (UTF-8).

Word will then write an "Encoding" line into the top of the HTML, which will
enable any browser to correctly recognise the characters in the file.

You can achieve the same thing with various other encoding, but all of them
have some disadvantage. If you use a Macintosh encoding, Windows browsers
will play up. If you use a Windows encoding, Mac browsers may struggle. If
you use any of the others, you may get variable results.

Unicode UTF-8 will work properly in all but a tiny fraction of browsers on
very old computers :)

Now: just for completeness, I recommend that you avoid "Unicode" and
"Unicode (Big endian)". The former uses 16-bit characters for everything,
which may keep the purists happy, but it results in files almost exactly
twice the size. The latter uses a reversed byte-order, which is technically
OK but may lead to spectacular failures with some of the less full-featured
browsers.

Cheers

--
------------------------------------------------------------------------
Phillip M. Jones, CET |LIFE MEMBER: VPEA ETA-I, NESDA, ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112 |[email protected], ICQ11269732, AIM pjonescet
------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://www.kimbanet.com/~pjones/90th_Birthday/index.htm>
<http://www.kimbanet.com/~pjones/Fulcher/default.html>
<http://www.kimbanet.com/~pjones/Harris/default.htm>
<http://www.kimbanet.com/~pjones/Jones/default.htm>

<http://www.vpea.org>
 
P

Peter Jamieson

As I understood it, UTF-8 and UTF-16 are both just encodings primarily
intended for compression- either of them can be used to encode any Unicode
character. Is tht not the case?

Peter Jamieson

Corentin Cras-Méneur said:
[...]
What if usually do is save directly out of Word, but I set the
encoding to UTF-8. That will support almost any character in the
known universe :)

(that would be UTF-16 John ;-) )


Corentin

--
--- Mac:MS MVP (Francophone) http://www.cortig.net/wordpress/ ---
http://www.mvps.org - http://mvp.support.microsoft.com MVPs
are not MS employees - Les MVP ne travaillent pas pour MS Remove
"NoSpam" to e-mail me - Retirez "NoSpam" pour m'écrire
 
J

John McGhie

Hi Corentin:

No, I did mean UTF-8 :)

UTF-16 plays up on many applications, because it relies upon a byte-order
mark which often gets screwed up :)

If I want it to "just work": anywhere, on any-thing, I use UTF-8.

Both support the same range of characters, but UTF-8 is half the size and
far more common, so more applications support it :) The majority of
www.word.mvps.org is done in UTF-8, and the bits that aren't are simply
because I haven't found them and changed them yet :)

Cheers

[...]
What if usually do is save directly out of Word, but I set the
encoding to UTF-8. That will support almost any character in the
known universe :)

(that would be UTF-16 John ;-) )


Corentin

--
Don't wait for your answer, click here: http://www.word.mvps.org/

Please reply in the group. Please do NOT email me unless I ask you to.

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
http://jgmcghie.fastmail.com.au/
Sydney, Australia. S33°53'34.20 E151°14'54.50
+61 4 1209 1410, mailto:[email protected]
 
J

John McGhie

Hi Peter:

Yes, that's my understanding too.

UTF-8 uses a "Shift" character to express high-order characters as
double-byte (16 bit) but expresses all ANSI characters as single-byte.
Since the majority of characters in English text ARE ANSI characters, it's
half the size.

UTF-16 encodes every character as 16-bits (two bytes) and is thus close to
double the size. And because it can be either "Big endian" or "Little
endian", it relies on the recipient application getting the byte order
correct.

Things can (and do...) go wrong along the way and one can get some problems
with badly-coded applications.

I believe that Asian applications will do better with UTF-16 because the
majority of their characters are double-byte. I'm going to China in a
couple of weeks, so I will let you know when I get back :)

Cheers

As I understood it, UTF-8 and UTF-16 are both just encodings primarily
intended for compression- either of them can be used to encode any Unicode
character. Is tht not the case?

Peter Jamieson

Corentin Cras-Méneur said:
[...]
What if usually do is save directly out of Word, but I set the
encoding to UTF-8. That will support almost any character in
known universe :)

(that would be UTF-16 John ;-) )


Corentin

--
--- Mac:MS MVP (Francophone) http://www.cortig.net/wordpress/ ---
http://www.mvps.org - http://mvp.support.microsoft.com MVPs
are not MS employees - Les MVP ne travaillent pas pour MS Remove
"NoSpam" to e-mail me - Retirez "NoSpam" pour m'écrire

--
Don't wait for your answer, click here: http://www.word.mvps.org/

Please reply in the group. Please do NOT email me unless I ask you to.

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
http://jgmcghie.fastmail.com.au/
Sydney, Australia. S33°53'34.20 E151°14'54.50
+61 4 1209 1410, mailto:[email protected]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top