S
Sander Voerman
Hi,
I have a website on which I have published some of my essays, which were
originally written in word. This is how I used to edit them:
* Save the document as a webpage from Ms Word
* Manually edit the html file in notepad or wordpad:
* Edit the html, doctype and meta headers
* Remove all word xml and office xml tags and all style tags from the
document
* Remove all microsoft additional properties from italic tags and the like
* Replace all non-xml special characters with the appropriate XML escape
codes
* Remove all conditionals such as <![if !supportEmptyParas]> and the like
* Replace all style classnames and ids with classnames and ids specified by
the CSS sheets of my website (you cant do this from within word because some
built in features have predefined classnames like MsoNormal, MsoTitle. of
course I used the replace function of wordpad wherever possible)
* Remove all stuff related to footnotes and literature references, and
replace them by JavaScript function calls to the neat scripts of my website
* Add header and footer javascript function calls
* Done!
As you can imagine, this is a lot of work to do manually, especially when
you write large essays. Now I am going to start a second (PHP/MySQL-based)
website, which will feature a lot of articles written in word, by other
people. Is there any way I can automate some or all of the steps mentioned
above? It would already be much easier for me if I just had a tool which
converted word documents to simple, plain html or xml files that only
contain basic document structure (emphasis, superscript and subscript,
paragraphs, blockquotes, headings and notes - but no fonts and margins and
stuff) and converted all special characters to valid XML escapes.
Sander
I have a website on which I have published some of my essays, which were
originally written in word. This is how I used to edit them:
* Save the document as a webpage from Ms Word
* Manually edit the html file in notepad or wordpad:
* Edit the html, doctype and meta headers
* Remove all word xml and office xml tags and all style tags from the
document
* Remove all microsoft additional properties from italic tags and the like
* Replace all non-xml special characters with the appropriate XML escape
codes
* Remove all conditionals such as <![if !supportEmptyParas]> and the like
* Replace all style classnames and ids with classnames and ids specified by
the CSS sheets of my website (you cant do this from within word because some
built in features have predefined classnames like MsoNormal, MsoTitle. of
course I used the replace function of wordpad wherever possible)
* Remove all stuff related to footnotes and literature references, and
replace them by JavaScript function calls to the neat scripts of my website
* Add header and footer javascript function calls
* Done!
As you can imagine, this is a lot of work to do manually, especially when
you write large essays. Now I am going to start a second (PHP/MySQL-based)
website, which will feature a lot of articles written in word, by other
people. Is there any way I can automate some or all of the steps mentioned
above? It would already be much easier for me if I just had a tool which
converted word documents to simple, plain html or xml files that only
contain basic document structure (emphasis, superscript and subscript,
paragraphs, blockquotes, headings and notes - but no fonts and margins and
stuff) and converted all special characters to valid XML escapes.
Sander