How do I remove Word's AutoFormatting characters?

J

jtanaka

I have a macro I use to save a binary Word document as WordML (XML). When I
perform the save, I want to remove all the special character formatting that
Word may have. For example, Word's Autoformatter changes hyphens to dashes
and apostrophes to curly apostrophes. I would like my saved XML to be clean
of these Word specific encodings. Is there a way to do this?

I tried adding the 'Encoding' and the 'AllowSubstitutions' parameters, but
this did not help.

It is not an option for me to have the authors of the documents I'm working
with disable the AutoFormatting options.

Sub SaveAsXML()

ChangeFileOpenDirectory _
"C:\Temp\"

Dim xmlfilename As String
xmlfilename = ActiveDocument.Name
'replace the extension with xml
xmlfilename = Left$(xmlfilename, Len(xmlfilename) - 3) & "xml"
ActiveDocument.SaveAs _
FileName:=xmlfilename, _
FileFormat:=wdFormatXML, _
Encoding:=msoEncodingUTF8, _
AllowSubstitutions:=True

End Sub


Thanks in advance,
Julie
 
C

Cindy M.

Hi Julie,
I have a macro I use to save a binary Word document as WordML (XML). When I
perform the save, I want to remove all the special character formatting that
Word may have. For example, Word's Autoformatter changes hyphens to dashes
and apostrophes to curly apostrophes. I would like my saved XML to be clean
of these Word specific encodings. Is there a way to do this?

I tried adding the 'Encoding' and the 'AllowSubstitutions' parameters, but
this did not help.

It is not an option for me to have the authors of the documents I'm working
with disable the AutoFormatting options.
There's certainly nothing built into Word that will do this "filtering" for
you. You'd have to process the entire document, either before or after saving,
replacing such symbols with ones acceptable to your purposes.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 
J

jtanaka

Thank you for your reply. I can add this to my XSLT to replace these special
Word chars. I guess I was hoping that Word had thought of this to play nice
with the generic XML world :)

Thanks again for your help,
Julie
 
K

Klaus Linke

jtanaka said:
Thank you for your reply. I can add this to my XSLT to replace these
special
Word chars. I guess I was hoping that Word had thought of this to play
nice
with the generic XML world :)

Thanks again for your help,
Julie


Hi Julie,

There are no special Word chars. Apostrophes and quotes and hyphens are the
same as everywhere else (Unicode, UTF-8).

If you need XML that only contains ANSI characters, there are probably tools
for that which replace non-ANSI characters with sensible replacements. It
would be a hard job to write your own transform which covers all thousands
of characters.

But maybe you're just doing something wrong importing the XML to whatever
program you use for further processing?

Regards,
Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top