Parsing Word To HTML using ASP.Net

N

Nick

Hello Everyone,

Can someone please provide me some advice on how to parse
the contents of an html file which was originally an MS
Word doc.

So far, my application allows the users to upload their
docs to my server and I'm able to save it as html.
However, all the proprietary ms html code is included
within the page.

I need to know how I can parse the contents out of this
page into an xml doc.

Thank you in advance.

Nick
 
C

Cindy M -WordMVP-

Hi Nick,
So far, my application allows the users to upload their
docs to my server and I'm able to save it as html.
However, all the proprietary ms html code is included
within the page.

I need to know how I can parse the contents out of this
page into an xml doc.
You might want to invest in Office 2003, since Word 2003
has the capability to save to Word's WordML format, which
is basically XML. It's documented, and once you've figured
out the schema ("object model") it uses you should be able
to extract the data with a transform.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update
Sep 30 2003)
http://www.mvps.org/word

This reply is posted in the Newsgroup; please post any
follow question or reply in the newsgroup and not by e-mail
:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top