I hate Word X's HTML!

D

Dave K.

Hey gang-

I hate the HTML that Word in Office X generates- I guess it's actually
XML or something. It's not compatible with my Web Page editor among
other things.

Is there any way to make it generate standard HTML the way it did in
Office 98 for Mac?

Dave
 
J

Jim Gordon

Hi Dave,

Why do you want to edit the HTML with XML and CSS that Word's Save As Web
Page makes? The whole idea of Word's HTML is to make web pages that display
as closely as possible to Word documents and also so the web pages can be
opened and edited in Word as Word documents. This is called
"round-tripping."

If you want to hand code, you'd be far better off saving as rtf from word
and then opening the file in your web authoring software. Another option is
to copy and paste from Word into your web authoring software.

You find most applications that create machine generated code will create
HTML files that are complex and often difficult, if not impossible, to code
by hand. I recommend you don't waste time trying.

Microsoft FrontPage 2003 (Windows only) and DreamWeaver (Mac & Windows) each
have routines that will strip Word's machine generated code so that it can
be edited by hand. If you use this feature you won't have a Word document
any more.

-Jim Gordon
Mac MVP

All responses should be made to this newsgroup within the same thread.
Thanks.

About Microsoft MVPs:
http://www.mvps.org/

Search for help with the free Google search Excel add-in:
<http://www.rondebruin.nl/Google.htm>
 
D

Dave K.

Jim Gordon said:
Hi Dave,

Why do you want to edit the HTML with XML and CSS that Word's Save As Web
Page makes?

Jim-

Because once I've converted the page to HTML, I then add navigation
features, buttons, etc. to incorporate it into my site.
If you want to hand code, you'd be far better off saving as rtf from word
and then opening the file in your web authoring software.

Alas- RTF doesn't seem to support advanced features like tables, etc,
that are contained in the Word pages I'm converting. For example, the
tables don't display in Safari when I did a test conversion to .RTF
Another option is
to copy and paste from Word into your web authoring software.

Again- goodbye tables.
You find most applications that create machine generated code will create
HTML files that are complex and often difficult, if not impossible, to code
by hand. I recommend you don't waste time trying.

Believe me- I wouldn't if I could avoid it! ;)
Microsoft FrontPage 2003 (Windows only) and DreamWeaver (Mac & Windows) each
have routines that will strip Word's machine generated code so that it can
be edited by hand.

Do you know offhand if Adobe GoLive has this feature?

I was hoping to use the much-less-expensive FreeWay web design program,
but it doesn't import existing sites period.
If you use this feature you won't have a Word document
any more.

Again- the Word doc is a throw-away that is sent to me from a different
department. My only concern is putting it on the web.

Thanks again!

Dave
 
L

Len Ford

I just created a Word HTML page with a table, centered text, an image
from Clip Art, and some other such junk. I opened the file in GoLive
6 and did the following:

1. In the Layout Window, I deleted all blue tags that had a lot of
(endif, if, else) stuff in it. When viewed in Safari, the paragraphs
were no longer separated by a blank line (standard <p> formatting
"appeared" to be lost)

2. Went into the Source Window and removed all the CSS stuff at the
top of the document. Upon viewing the document in Safari, the
formatting was identical to the original Word HTML file. It's not
elegant, and Frontpage Extensions are NOT used in Word documents, so
the FrontPage extension strippers don't work.

3. Go to: http://studio.adobe.com/ sign up for a user name and go
to the GoLive page, click on the forum, and see if someone can answer
your question better. I'm sure it's possible, but I'm not sure who
(if anyone) has a plug in to remove Word excessive XML stuff.

HTH

Len
 
P

Phillip M. Jones, C.E.T.

I use Macromedia DreamWeaver2004 to do web pages on my association's website.
One feature it has is "Correct HTML" when you click on it is has a specific
setting correct Micrsoft Word html", as one of it choices. I know I use to use
Adobe's PageManager and if I opened an HTML document created in word the
document would show up formatted as for Bold, Italic, Underline, center
justified right left or full. But there would godzillions of this strange red
tags. If they were upload to the server without correction these red tags would
show up. It even has a choice on menu to convert HTML to XML.

Jim said:
Hi Dave,

Why do you want to edit the HTML with XML and CSS that Word's Save As Web
Page makes? The whole idea of Word's HTML is to make web pages that display
as closely as possible to Word documents and also so the web pages can be
opened and edited in Word as Word documents. This is called
"round-tripping."

If you want to hand code, you'd be far better off saving as rtf from word
and then opening the file in your web authoring software. Another option is
to copy and paste from Word into your web authoring software.

You find most applications that create machine generated code will create
HTML files that are complex and often difficult, if not impossible, to code
by hand. I recommend you don't waste time trying.

Microsoft FrontPage 2003 (Windows only) and DreamWeaver (Mac & Windows) each
have routines that will strip Word's machine generated code so that it can
be edited by hand. If you use this feature you won't have a Word document
any more.

-Jim Gordon
Mac MVP

All responses should be made to this newsgroup within the same thread.
Thanks.

About Microsoft MVPs:
http://www.mvps.org/

Search for help with the free Google search Excel add-in:
<http://www.rondebruin.nl/Google.htm>

--
---------------------------------------------------------------------------
Phillip M. Jones, CET |MEMBER:VPEA (LIFE) ETA-I, NESDA,ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112-1809 |[email protected], ICQ11269732, AIM pjonescet
---------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://home.kimbanet.com/~pjones/birthday/index.htm>
<http://vpea.exis.net>
 
J

Jerry Stratton

I hate the HTML that Word in Office X generates- I guess it's actually
XML or something. It's not compatible with my Web Page editor among
other things.

Is there any way to make it generate standard HTML the way it did in
Office 98 for Mac?

Have you tried telling it to save only display data? There should be a
Web Options when you go to save it that lets you do this. It vastly cuts
down on the extra stuff.

Jerry
 
J

Jim Gordon

Hi again,

I see others have posted some useful suggestions (yea!).

As I mentioned, DreamWeaver has a feature that cleans up Word's HTML. Sounds
to me like this is the best way to go.

Another option is to use Word itself. Put the buttons and other objects into
Word. To add a hyperlink to any object simply select it and then use Apple+K
to display the insert hyperlink dialog box. Then save the document as a web
page.

Personally, I prefer to use Excel to make tables for the web. If your tables
have custom formatting, Excel makes the best web tables that I know of when
you save as a web page from Excel.

-Jim Gordon
Mac MVP

All responses should be made to this newsgroup within the same thread.
Thanks.

About Microsoft MVPs:
http://www.mvps.org/

Search for help with the free Google search Excel add-in:
<http://www.rondebruin.nl/Google.htm>

----------
 
C

Corentin Cras-Méneur

Jim Gordon said:
Hi again,

I see others have posted some useful suggestions (yea!).
:)))

As I mentioned, DreamWeaver has a feature that cleans up Word's HTML. Sounds
to me like this is the best way to go.

Yep, but you must have DreamWeaver :-\
I understand that Word is desperately trying to produce a page that will
look exactly like its word-document counterpart, but the problem is that
the resulting pages don't even pass the W3C web validations tests :-\
As far as I'm concerned, Word should create a clean page and a decent
..css to go along with it. I am really not a big fan of the pseudo-xml
that currently gets generated. I want standards !!! ;-)))

Another option is to use Word itself. Put the buttons and other objects into
Word. To add a hyperlink to any object simply select it and then use Apple+K
to display the insert hyperlink dialog box. Then save the document as a web
page.

Personally, I prefer to use Excel to make tables for the web. If your tables
have custom formatting, Excel makes the best web tables that I know of when
you save as a web page from Excel.

This is the part that puzzles me the most. Looking at the structures fo
the Excel and PPT generated html pages, the code seems much cleaner.
What's wrong with Word ???


Corentin


PS I shouldn't get started on the subject... html code from Word is one
of my "favorite" targets when it comes to the features I would like
changed/improved in Word.
 
A

Arno Wouters

Corentin Cras-Méneur said:
I understand that Word is desperately trying to produce a page that will
look exactly like its word-document counterpart, but the problem is that
the resulting pages don't even pass the W3C web validations tests :-\

There used to be a utility that cleans up MS's messy HTML. It is called
Tidy and available from
<http://www.geocities.com/SiliconValley/1057/tidy.html>. I haven't used
it for years but as far as I recall it produces valid and readable HTML
but it doesn't make the markup more efficient.
 
A

Arno Wouters

Arno Wouters said:
There used to be a utility that cleans up MS's messy HTML. It is called
Tidy and available from
<http://www.geocities.com/SiliconValley/1057/tidy.html>. I haven't used
it for years but as far as I recall it produces valid and readable HTML
but it doesn't make the markup more efficient.

I just took a look at the "manual page" of Tidy
<http://www.w3.org/People/Raggett/tidy/>. It says:

"Tidy can now perform wonders on HTML saved from Microsoft Word 2000!
Word bulks out HTML files with stuff for round-tripping presentation
between HTML and Word. It will go to great pains to strip out all the
surplus stuff Microsoft Word 2000 inserts when you save Word documents
as 'Web pages'. Of course Tidy does a good job on Word'97 files as
well!"

AS I suspect Word 2001 en Word X HTML to be close to Word 2000 HTML and
Word 99 HTML close to Word 89 HTML, this sounds very promissing! Haven't
tried it yet.

BTW, this utility is freeware.
 
C

Corentin Cras-Méneur

Hi Arno,
"Tidy can now perform wonders on HTML saved from Microsoft Word 2000!
Word bulks out HTML files with stuff for round-tripping presentation
between HTML and Word. It will go to great pains to strip out all the
surplus stuff Microsoft Word 2000 inserts when you save Word documents
as 'Web pages'. Of course Tidy does a good job on Word'97 files as
well!"

AS I suspect Word 2001 en Word X HTML to be close to Word 2000 HTML and
Word 99 HTML close to Word 89 HTML, this sounds very promissing! Haven't
tried it yet.


Tidy html is an old-time very well known and efficient application for
cleaning up html. I've used it for years (but I now favor the clean-up
tool in BBEdit).


Corentin
 
A

Arno Wouters

Corentin Cras-Méneur said:
Tidy html is an old-time very well known and efficient application for
cleaning up html. I've used it for years (but I now favor the clean-up
tool in BBEdit).

Apart from the Tidy plug-in I can't find a cleaning tool for Word in
BBEdit (I only have Go-live, PageMill and Homepage cleaners).

Anyway, it sounds as if Tidy is the tool Dave is looking for!
 
C

Corentin Cras-Méneur [MVP]

Arno Wouters said:
Apart from the Tidy plug-in I can't find a cleaning tool for Word in
BBEdit (I only have Go-live, PageMill and Homepage cleaners).

BBEdit can help you fix malformed headers and tags (and Word does not
even generate a doctype declaration...). You have to use the Check
syntax tool.


Anyway, it sounds as if Tidy is the tool Dave is looking for!

:))

Corentin
 
A

Arno Wouters

Corentin Cras-Méneur said:
BBEdit can help you fix malformed headers and tags (and Word does not
even generate a doctype declaration...). You have to use the Check
syntax tool.

Ah!

Thanks,

Arno.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top