loosing unicode characters when copying HTML into a Word.Range obj

R

raj

I am working in VB6 and Word 2003, creating a word document from source
material that has been converted from RTF to HTML. I take the HTML and paste
it into a Word.Range object. This works fine for almost all cases, but I have
found that there are certain special characters that are being lost during
the conversion, § for instance.
The code we are using to insert this HTML into the Range object is a hack
that copies to the Windows clipboard, and then uses the Range.Paste method to
insert the text.

Is there a better way to programmatically insert text that is in HTML format
into a word document? If that is not the case, I can continue using the
method we have, if I can figure out how to insure that the special characters
are inserted properly.

PS The special characters work fine if I go into the word document and
manually type them in using the ALT KEY + Character Code functionality.

Thanks in advance for any advice you can provide.
 
P

Peter Jamieson

A few guesses...

Maybe you have solved this already, but if you open the HTML of one of your
converted documents in (say) Notepad,
a. if you try to do File|Save As, does Notepad try to save the file as ANSI
or Unicode (this tells you something about the actual format of the .htm
file)
b. can you see how the "problem" characters are represented? Do you just
see a "?" for each one, or multiple characters?
c. can you see a header in the HTML file that defines the character
encoding used for that HTML.

In (b), if you only see "?", then the chances are that the character
information has been lost in the transformation from RTF to HTML. You would
have to go back to the RTF and use a dfferent translation method to recover
it.

Otherwise, if there is no character encoding information in the HTML file,
perhaps Word assumes a different encoding than the one actually used in the
HTML file. I don't know what you can do about that - the options in Word
Tools|Options|General|Web options seem to be more about the encoding when
you save files, but maybe you could add the correct bit of |HTML header to
each file before processing them.

Or maybe the encoding information is wrong, i.e. the HTML header does not
match the actual encoding used.

Or possibly Word gets it wrong anyway. (I don't know).

Peter Jamieson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top