Converting Word Table Cell Contents to HTML

C

Craig Petrie

Hi,
My client has a large table in Word 2003(and will not change to another
method) that has formatted text in the cells and I need to read and convert
each cells formatted contents to simple html(no wordML) via vb.net code. The
formatting contains: bold, italic, superscript & subscript text, bullet
points.

In my VB.net app I currently open and read the word table and output all
cell contents into an XML structure OK but this strips out any formatting -
I just need to know how to read and convert the formatted text inside a
single cell to html format in vb.net.

Can anyone suggest tools/components or code or examples that would help me
do this.

cheers,
Craig, New Zealand

I am using VB.net 2003 and Office Professional Edition 2003.
 
P

Peter Hewett

Hi Craig Petrie

You posted the same question to microsoft.public.word.vba.general on 14/5, to which I
suggested <What about copying the table to a new Word document then saving that document
as Filtered HTML?>

Did this help?

Cheers - Peter

Hi,
My client has a large table in Word 2003(and will not change to another
method) that has formatted text in the cells and I need to read and convert
each cells formatted contents to simple html(no wordML) via vb.net code. The
formatting contains: bold, italic, superscript & subscript text, bullet
points.

In my VB.net app I currently open and read the word table and output all
cell contents into an XML structure OK but this strips out any formatting -
I just need to know how to read and convert the formatted text inside a
single cell to html format in vb.net.

Can anyone suggest tools/components or code or examples that would help me
do this.

cheers,
Craig, New Zealand

I am using VB.net 2003 and Office Professional Edition 2003.

HTH + Cheers - Peter
 
C

Craig Petrie

Thanks for that reply Peter,
Unfortunately this did not work so well - Word includes a large amount of
wordML stuff even though I saved it as Filtered HTML.


What I need to do is convert the contents of each cell on the fly in a
vb.net application into clean(non-WordML) HTML. This is a requirement of the
specification and I am not allowed to change this.

cheers,
Craig
 
H

Howard Kaikow

Have you tried exporting the table to Excel and then saving the HTML from
Excel?
Maybe Excel will do better?
 
C

Craig Petrie

Thanks for the suggestion anyway Howard.

I tried this but unfortunatley it produced a lot of padding HTML that does
not work well.

Would still be keen to hear of any tool/technique that converts the word
formatted fragment/table cell contents(formatted) to clean generic HTML

cheers
Craig
 
H

Howard Kaikow

I would not create HTML using Word.
Better to use FrontPage.

You might try passing the HTML thru an HTML filter such as HTML Tidy and see
what it comes up with.
Take a look at http://tidy.sourceforge.net/ for the HTML Tidy Project.
 
M

Mark Baird

If you save the document as "Web Page, Filtered" you will get HTML 4.0 with CSS 1.0 as shown at the end of this message. How generic do you need it? HTML 3.2 with inline styles? There are also several options in Word that will change the way the HTML is exported for older version browsers

When you talk about getting wordML then you must be saving as XML. You could also save the file as wordML and write an XSLT to convert to HTML. You could also write your own program to convert the wordML to HTML

To read the formatting of a paragraph you access the paragraph objects in the cell. There are many points of access into the paragraph object such as below. You would also need to access font properties as well

Selection.ParagraphFormat.Alignmen
Selection.Font.Bol
Selection.Paragraphs(1).Alignmen
Selection.Cells(1).Range.ParagraphFormat.Alignmen
Selection.Cells(1).Range.Paragraphs(1).Alignmen
ActiveDocument.Tables(1).Cell(1, 1).Range.ParagraphFormat.Alignmen

Exporting HTML from Word 2003 is fine for the average user that has more Word experience then HTML experience

Mark Bair

<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><meta name=Generator content="Microsoft Word 11 (filtered)"><title>This is a test</title><style><!-
/* Font Definitions *
@font-fac
{font-family:Verdana
panose-1:2 11 6 4 3 5 4 4 2 4;
@font-fac
{font-family:Tahoma
panose-1:2 11 6 4 3 5 4 4 2 4;
/* Style Definitions *
p.MsoNormal, li.MsoNormal, div.MsoNorma
{margin:0in
margin-bottom:.0001pt
font-size:10.0pt
font-family:Verdana;
h
{margin-top:12.0pt
margin-right:0in
margin-bottom:3.0pt
margin-left:0in
page-break-after:avoid
font-size:16.0pt
font-family:Arial;
h
{margin-top:12.0pt
margin-right:0in
margin-bottom:3.0pt
margin-left:0in
page-break-after:avoid
font-size:14.0pt
font-family:Arial
font-style:italic;
h
{margin-top:12.0pt
margin-right:0in
margin-bottom:3.0pt
margin-left:0in
page-break-after:avoid
font-size:13.0pt
font-family:Arial;
p.MsoToc1, li.MsoToc1, div.MsoToc
{margin:0in
margin-bottom:.0001pt
font-size:10.0pt
font-family:Verdana;
a:link, span.MsoHyperlin
{color:blue
text-decoration:underline;
a:visited, span.MsoHyperlinkFollowe
{color:purple
text-decoration:underline;
p.MsoAcetate, li.MsoAcetate, div.MsoAcetat
{margin:0in
margin-bottom:.0001pt
font-size:8.0pt
font-family:Tahoma;
span.b
{font-family:"Courier New"
color:red
font-weight:bold
text-decoration:none none;
span.m
{color:blue;
span.ns
{color:red;
span.t
{color:#990000;
span.tx
{font-weight:bold;
p.CharStyle, li.CharStyle, div.CharStyl
{margin:0in
margin-bottom:.0001pt
font-size:10.0pt
font-family:Verdana;
span.CharStyleCha
{font-family:Verdana;
span.CharStyle
{font-family:Verdana
font-weight:bold;
@page Section
{size:8.5in 11.0in
margin:1.0in 1.25in 1.0in 1.25in;
div.Section
{page:Section1;
--></style></head><body lang=EN-US link=blue vlink=purple><div class=Section1><p class=MsoNormal><span class=CharStyle2>This</span> is a test.</p></div></body></html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top