decent HTML output?

T

Timothy J. Luoma

I was unsurprised to see that Word2004 still creates the most bloated,
ugliest HTML imaginable.

Is there a filter out there that will clean out that crap and leave me
with something that looks like normal HTML, or (dare I say it?) XHTML???

TjL
 
T

Tim Murray

I was unsurprised to see that Word2004 still creates the most bloated,
ugliest HTML imaginable.

Is there a filter out there that will clean out that crap and leave me
with something that looks like normal HTML, or (dare I say it?) XHTML???

There is a plug-in for GoLive that strips it out. Not sure about DreamWeaver.
 
T

Timothy J. Luoma

Dreamweaver has a feature called "Clean up Word HTML" which allows youto
pick and choose options if desired.


Anyone know of a solution for less than $300 :-?

Absurd that they can't give a mode to produce clean HTML.

TjL
 
C

Corentin Cras-Méneur [MVP]

Beth Rosengard said:
Dreamweaver has a feature called "Clean up Word HTML" which allows you to
pick and choose options if desired.


It does. I still prefer copying the text in Word, using "paste as text"
in an html document in DreamWeaver and take care of the page layout from
there using css. Nothing beats it (quite unfortunately).

You can also try a simple copy and paste between the two apps. No matter
what, it still is better that trying to fix the html Word generates.


Corentin (GNKSA and W3C supporter)
 
J

JE McGimpsey

Timothy J. Luoma said:
Absurd that they can't give a mode to produce clean HTML.

I too would like "clean HTML". Please also let MacBU know you want this
by choosing Help/Send Feedback on Word.

Make sure you describe what you mean by "clean HTML" - i.e, HTML3.2?
HTML4? HTML4.01? XHTML 1.0? XHTML 1.1? XHTML 2? Strict? Transitional?
Basic? Frameset? Internal stylesheets? External stylesheets? CSS1? CSS2?
CSS3? Separate stylesheets for Print? Screen? PDA? Should it be WCAG
compliant? Section 508 compliant? CLF compliant? eEurope Plan compliant?

Also make sure you let them know how much more you would pay for an
update that provided a "clean HTML" solution. A reasonable estimate of
how many people in your company, field, etc. would purchase an update at
that price would help MacBU determine their expected return on
investment for this feature.

Word's HTML design was apparently done in such a way as to display
documents that look like printed Word documents in most browsers, and
can be round-tripped back into Word. Clean HTML obviously can't be
round-tripped, so that hasn't been a priority.
 
P

Phillip M. Jones, CE.T.

JE said:
I too would like "clean HTML". Please also let MacBU know you want this
by choosing Help/Send Feedback on Word.

Make sure you describe what you mean by "clean HTML" - i.e, HTML3.2?
HTML4? HTML4.01? XHTML 1.0? XHTML 1.1? XHTML 2? Strict? Transitional?
Basic? Frameset? Internal stylesheets? External stylesheets? CSS1? CSS2?
CSS3? Separate stylesheets for Print? Screen? PDA? Should it be WCAG
compliant? Section 508 compliant? CLF compliant? eEurope Plan compliant?

Also make sure you let them know how much more you would pay for an
update that provided a "clean HTML" solution. A reasonable estimate of
how many people in your company, field, etc. would purchase an update at
that price would help MacBU determine their expected return on
investment for this feature.

Word's HTML design was apparently done in such a way as to display
documents that look like printed Word documents in most browsers, and
can be round-tripped back into Word. Clean HTML obviously can't be
round-tripped, so that hasn't been a priority.

You can clean up the word stuff in HTML using Macromedia's Dreamweaver.
They even have a menu choice "fix Microsoft html".

--
---------------------------------------------------------------------------
Phillip M. Jones, CET |MEMBER:VPEA (LIFE) ETA-I, NESDA,ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112-1809 |[email protected], ICQ11269732, AIM pjonescet
---------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://home.kimbanet.com/~pjones/birthday/index.htm>
<http://vpea.exis.net>
 
J

JE McGimpsey

Phillip M. Jones said:
You can clean up the word stuff in HTML using Macromedia's Dreamweaver.
They even have a menu choice "fix Microsoft html".

I haven't used a recent version, but IIRC, Dreamweaver cleaned up *some*
of the lousy HTML. I then had to run Tidy to get more, then clean the
rest up by hand. And XHTML strict wasn't an option - has that changed?

I find it's easier to paste text into a GoLive template and use CSS2,
just like Word's styles (in fact, I have CSSs for my most frequently
used Word templates that use the same style names and produce nearly the
same presentation on screen or in print). Takes about 2 minutes for a 25
page doc.

The biggest problem I have is converting tables - while tables are quite
useful in Word, I can't use them in my web docs since my clients don't
allow them (and tables in web docs just generally suck, of course -
broken accessibility for the disabled or for alternate user agents,
slower loading, higher bandwidth, higher maintenance costs, reduced
flexibility, poor search engine performance, etc.).
 
P

Phillip M. Jones, CE.T.

JE said:
I haven't used a recent version, but IIRC, Dreamweaver cleaned up *some*
of the lousy HTML. I then had to run Tidy to get more, then clean the
rest up by hand. And XHTML strict wasn't an option - has that changed?

I find it's easier to paste text into a GoLive template and use CSS2,
just like Word's styles (in fact, I have CSSs for my most frequently
used Word templates that use the same style names and produce nearly the
same presentation on screen or in print). Takes about 2 minutes for a 25
page doc.

The biggest problem I have is converting tables - while tables are quite
useful in Word, I can't use them in my web docs since my clients don't
allow them (and tables in web docs just generally suck, of course -
broken accessibility for the disabled or for alternate user agents,
slower loading, higher bandwidth, higher maintenance costs, reduced
flexibility, poor search engine performance, etc.).

I use Dreameweaver2004 for Mac which came out in September. They use standards
current as of Septemeber. 7.0 and there is an update 7.01. so that it works on 10.3.x

--
---------------------------------------------------------------------------
Phillip M. Jones, CET |MEMBER:VPEA (LIFE) ETA-I, NESDA,ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112-1809 |[email protected], ICQ11269732, AIM pjonescet
---------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://home.kimbanet.com/~pjones/birthday/index.htm>
<http://vpea.exis.net>
 
J

JE McGimpsey

Phillip M. Jones said:
I use Dreameweaver2004 for Mac which came out in September. They use
standards current as of Septemeber. 7.0 and there is an update 7.01.
so that it works on 10.3.x

So the Fix Microsoft HTML will produce standards-compliant code? Cool!
Which standards?

If it takes a Word doc and produces XHTML1.1 Strict code with an
external CSS2 style sheet, I'd have to take a close look at purchasing
it.
 
P

Phillip M. Jones, CE.T.

JE said:
So the Fix Microsoft HTML will produce standards-compliant code? Cool!
Which standards?

If it takes a Word doc and produces XHTML1.1 Strict code with an
external CSS2 style sheet, I'd have to take a close look at purchasing
it.

Go to macromedia Site.

They let you download a Trial version that you can play with for 30 days. (or they
did when I first trief out DreamWeaver MX (6.0). Might be worth a look if you have
the space on your Hard Drive.

--
---------------------------------------------------------------------------
Phillip M. Jones, CET |MEMBER:VPEA (LIFE) ETA-I, NESDA,ISCET, Sterling
616 Liberty Street |Who's Who. PHONE:276-632-5045, FAX:276-632-0868
Martinsville Va 24112-1809 |[email protected], ICQ11269732, AIM pjonescet
---------------------------------------------------------------------------

If it's "fixed", don't "break it"!

mailto:p[email protected]

<http://www.kimbanet.com/~pjones/default.htm>
<http://home.kimbanet.com/~pjones/birthday/index.htm>
<http://vpea.exis.net>
 
J

John McGhie

Hi Timothy:

I was unsurprised to see that Word2004 still creates the most bloated,
ugliest HTML imaginable.

{Giggle} Word doesn't produce HTML and has never tried to do so. It's
producing XML. Specifically, Word Markup Language, which is a Word Document
encoded in the XML syntax. This stuff was never designed to be
human-readable, or pretty. If you don't like it, don't look :)

If you go to Web Options>Files on the Save As dialog and check the "Save
display information only" checkbox, the XML produced will be much lighter in
weight because it then strips out all of the Word document artefacts that
cannot be described in HTML.
Is there a filter out there that will clean out that crap and leave me
with something that looks like normal HTML, or (dare I say it?) XHTML???

If you have a copy of Windows handy, Microsoft's "HTML filter 2" application
does a grand job, and it's a free download.

The latest version (2003) of FrontPage cleans it up very nicely too.

However, someone suggested that copy in Word and Paste in an HTML editor was
still the best way to go, and that's what I also find.

Cheers

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Consultant Technical Writer
Sydney, Australia +61 4 1209 1410
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top