Open html as source

T

Tobias Weber

Hi,
how do I get Word X to oben a .html file containintg utf8 as plain text?
I always either get rendered html, am not allowed to select the file, or
have garbled characters :-(
 
C

CyberTaz

Am not sure about Word X, but the first thing I'd try is opening the file as
rendered html, then use View>HTML Source. May not be there in X, though.

HTH |:>)
Bob Jones
[MVP] Office:Mac
 
T

Tobias Weber

CyberTaz said:
Am not sure about Word X, but the first thing I'd try is opening the file as
rendered html, then use View>HTML Source. May not be there in X, though.

It's there but doesn't recognize the <meta utf8> and there's no way to
change it manually.
 
J

John McGhie [MVP Word, Word Mac]

Hi Tobias:

Well, Word does not neet to recognise the "Meta tag", it's the CharSet tag
it needs to get hold of. It should recognise that OK, provided that the
content really is UTF-8.

And there is a way around it: Set Word>Preferences>General> "Confirm
conversions at open" to ON, then use File>Open (MUST be from within Word) to
open the file. You will then get a dialog asking you what format the file
is in. Choose UTF-8 at that point.

Cheers

--

Please reply in the group. Please do NOT email me unless I ask you to.

http://jgmcghie.fastmail.com.au/

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
Sydney, Australia. GMT + 10 Hrs
+61 4 1209 1410, mailto:[email protected]
 
T

Tobias Weber

John McGhie said:
Well, Word does not neet to recognise the "Meta tag", it's the CharSet tag

I'm sure we both mean <meta http-equiv="content-type"
content="text/html; charset=utf-8">
it needs to get hold of. It should recognise that OK, provided that the

It does when displaying rendered html. "View source" shows wide
characters as two symbols, so I suppose it forgot that it's UTF8.
content really is UTF-8.

It is, although sans BOM.
And there is a way around it: Set Word>Preferences>General> "Confirm
conversions at open" to ON, then use File>Open (MUST be from within Word) to

That's what I was looking for. Thanks!
open the file. You will then get a dialog asking you what format the file
is in. Choose UTF-8 at that point.

There is no UTF-8 in the list, only "Unicode Text", which apparently
expects UFT-16 as my document comes out as only underscores.
 
D

Daiya Mitchell

Why not right-click and try opening it in Safari or Firefox and view
source? If you only need the plain text, surely there's no need for
this to be a Word document.
 
T

Tobias Weber

Daiya Mitchell said:
source? If you only need the plain text, surely there's no need for
this to be a Word document.

But then there'd be no need for me to even ask, would it?

Actually I want to use the "Track Changes" feature, since FileMerge only
does ascii. Apparently Word is also challenged with Unicode (pasting
rich Text from Cocoa apps doesn't work, either) so I'll have to make do
with TkDiff.

But I still would like to KNOW ;-)
 
E

Elliott Roper

Tobias Weber said:
But then there'd be no need for me to even ask, would it?

Actually I want to use the "Track Changes" feature, since FileMerge only
does ascii. Apparently Word is also challenged with Unicode (pasting
rich Text from Cocoa apps doesn't work, either) so I'll have to make do
with TkDiff.

But I still would like to KNOW ;-)

You could try this sneaky slimy work-around:
view source in Safari.
copy and paste unformatted into Word
when you finish faffing about, save as text only
I tried it on a fairly complex pile of html lying about on the web and
it worked like a charm.
I'd prefer to edit html in emacs, but it is nice to know you can be an
idiot when you want to ;-)
 
T

Tobias Weber

does ascii. Apparently Word is also challenged with Unicode (pasting
rich Text from Cocoa apps doesn't work, either) so I'll have to make do
[/QUOTE]
view source in Safari.
copy and paste unformatted into Word

Notice something? Oh, and when saying Unicode I mean Japanese.
 
J

John McGhie [MVP Word, Word Mac]

Hi Tobias:

OK, so Open as "Text" and see if you can see what is wrong with the header:
change it, re-save it...

I have a horrible feeling you are running into a built-in limitation of
Word X -- Word X can't display most Unicode.

It would appear that the document might be coded for a double-byte character
set (e.g. a Japanese font). If so, those underscores show that it's
recognising the charset tag perfectly, but Word X can't display those
characters! Word 2004 "can".

I see in a second post you admitted the thing is in Japanese :) Sorry:
You are going to struggle with that in Word X. Time to upgrade (wait for
2008 -- it will be further improved in its ability to do HTML-y and
Unicode-y things).

Cheers

--

Please reply in the group. Please do NOT email me unless I ask you to.

http://jgmcghie.fastmail.com.au/

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
Sydney, Australia. GMT + 10 Hrs
+61 4 1209 1410, mailto:[email protected]
 
T

Tobias Weber

John McGhie said:
It would appear that the document might be coded for a double-byte character
set (e.g. a Japanese font). If so, those underscores show that it's
recognising the charset tag perfectly, but Word X can't display those

When rendering the html everything comes out fine, then choosing "View
Source" I get correct html code and English text but garbled Japanese
(two symbols per character; I mentioned that).

But directly opening as some variation of text will display *everything*
as underscores.
characters! Word 2004 "can".

I see in a second post you admitted the thing is in Japanese :) Sorry:
You are going to struggle with that in Word X. Time to upgrade (wait for

If I change the account's locale to Japanese Word X will expect it and
change fonts accordingly if I paste from a Cocoa app (like Safari's
"View Source"). Viable workaround.

Opening as (Unicode) text still won't work.
2008 -- it will be further improved in its ability to do HTML-y and
Unicode-y things).

s/improved/fixed/

I read that 2008 still will be different versions for "optimal"
localization, i.e. I'd either have to get it in each language I use
(German, English and Japanese) or compromise on language-specific
features. Currently I have both German and US versions of Office X
installed in folders which are only visible to one account each.
Otherwise they would overwrite each other's settings and double clicking
a file would launch one at random...
 
J

John McGhie [MVP Word, Word Mac]

Hi Tobias:

I wouldn't believe everything you read on that subject :)

I suspect Mac Word 2008 will be the same as Win Word 2007: There is only
one version of the application. The "localisation" is a set of external
files. In other words, "English" is just another localisation. They ship
English (All flavours), French, Spanish and one or two others on the
Application DVD. If you want any of the others, you order the
Multi-Language User Interface pack, another DVD which contains about 40
different languages.

I see no reason why they couldn't use the same set of localisation files for
both the PC and Mac versions: that would just be sensible software design.

But then again: you're reading this too, aren't you? Maybe not believe
this either :)

We'll all have to wait and see!

--

Please reply in the group. Please do NOT email me unless I ask you to.

http://jgmcghie.fastmail.com.au/

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
Sydney, Australia. GMT + 10 Hrs
+61 4 1209 1410, mailto:[email protected]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top