P
Peter Rooney
I've discovered that by Saving a document As HTML, it will preserve
attributes such as italic ( <I> ... </I> ), bold, and certain characters,
such as #268; (which is a Czech accent). If the document is a table, the
content of the cells is saved inside of htm pairs like <td> ... </td> . In
fact, the Save As HTM gives me a text file which I can analyze and process
with external programs.
So far so good. The problem is, that the Save As conversion also gives me a
lot of trash that is of no use to me, such as:
<p class=MsoPlainText style='margin-left:.25 in. etc etc
<span style='font-size:12 etc ... </span>
etc
I can search and replace some of these strings, and I have developed
filters that can take care of others. But it's a laborious process involving
various software. It would be better not to get the "trash" in the first
place. The Save As XML option is even worse. Is there a way to make a simple
conversion using Word or 3rd party software?
attributes such as italic ( <I> ... </I> ), bold, and certain characters,
such as #268; (which is a Czech accent). If the document is a table, the
content of the cells is saved inside of htm pairs like <td> ... </td> . In
fact, the Save As HTM gives me a text file which I can analyze and process
with external programs.
So far so good. The problem is, that the Save As conversion also gives me a
lot of trash that is of no use to me, such as:
<p class=MsoPlainText style='margin-left:.25 in. etc etc
<span style='font-size:12 etc ... </span>
etc
I can search and replace some of these strings, and I have developed
filters that can take care of others. But it's a laborious process involving
various software. It would be better not to get the "trash" in the first
place. The Save As XML option is even worse. Is there a way to make a simple
conversion using Word or 3rd party software?