XML encoding issues

S

Samuel C. Yang

When I want to embed my own XML in a <SolutionXML> element, I need to use a
System.String to assign to the Formula(U) property of an appropriate cell
(named, say, "User.myRow.Value"). There are several problems with this:

(1) I do not see a way to find out the character encoding to use for my XML
string. In looking at generated *.vdx files, I see that the encoding always
seems to be "utf-8". However, I do not see that encoding officially
specified in the Visio documentation, and I do not see a programmatic way to
get that information either. So, there is no way to know the right encoding
to use for my embedded XML.

(2) Even assuming, for the sake of argument, that the encoding is always
supposed to be UTF-8, there is a problem when using the .NET framework (as
in a COM Add-in). In .NET, System.String objects always output UTF-16
strings! How do I output a UTF-8 string in .NET (remember, the object that
assigns to FormulaU must be of type System.String)?

(3) Now, if I cannot even assume that the encoding is always UTF-8 (and
assuming there is a programmatic way to find out the proper encoding to
use), then I have the generalized version of (2). That is, how do I
generate any given encoding under .NET?
 
S

Samuel C. Yang

More info and comments:

The SolutionXMLElement property of Visio Document objects has the same
problem. That is, the XML string assigned to the property is endcoded in
UTF-16 in the *.vdx file, whereas the rest of the XML file is encoded in
UTF-8. I was half hoping that at least the implementation of the
SolutionXMLElement property would take the System.String and convert it to
UTF-8 before storing it in the *.vdx file, but no such luck.

It looks like Visio 2003 was not tested for gloablization with .NET clients.
I think the easiest fix would be to add a property (say, "XMLEncoding") to
the IVDocument class, and that it should support at least these two values:
"utf-8" and "utf-16". The default value can remain "utf-8", I guess, but
Visio should throw an exception if anyone tries to store a <SolutionXML>
element (in any of the various ways supported) when the "XMLEcoding"
property is anything but "utf-16".

Without that kind of fix, it seems to me you are going to have to do some
major surgery to the existing Visio classes (like having alternate versions
of the Formula(U) properties so that they can accept byte arrays in addition
to strings).
 
S

Samuel C. Yang

Or, an even better fix would be to have the COM Interop Assembly
implementations of the Formula(U) and SolutionXMLElement properties detect,
when appropriate, that a passed string is a legal <SolutionXML> element, and
automatically convert such string values to whatever encoding is appropriate
for the output *.vdx file. When assigning to a string from one of these
properties, the reverse conversion (to UTF-16) would also be done
automatically.

This requires no API changes at all, and makes the developer's task simple.
 
S

Samuel C. Yang

Never mind...

Well, it turns out that it's good that nobody else has responded yet to this
thread. [I seem to be talking to myself most of the time, anyway :)].

In doing further testing, it looks like the Visio Primary Interop Assembly
(PIA) is doing the right thing after all. That is , it is doing the proper
UTF-16/UTF-8 conversions when writing to and reading from *.vdx files. At
least with the limited new testing that I've done.

I had originally tested using some Hebrew text interspersed with Latin text,
and I believe the right-to-left nature of the Hebrew text caused some
confusion (either in me, or in the various editors/debuggers/browsers that I
used to look at things) -- the results had been very strange indeed.

Sam
 
M

Mark Nelson [MS]

This is good news. You raised an excellent question, and we have been
discussing it internally for a bit. Our basic conclusion is that you should
not have to worry about encoding when programmatically setting SolutionXML.
If you do encounter a problem, we'd like to know about it.

--
Mark Nelson
Microsoft Corporation

This posting is provided "AS IS" with no warranties, and confers no rights.


Samuel C. Yang said:
Never mind...

Well, it turns out that it's good that nobody else has responded yet to this
thread. [I seem to be talking to myself most of the time, anyway :)].

In doing further testing, it looks like the Visio Primary Interop Assembly
(PIA) is doing the right thing after all. That is , it is doing the proper
UTF-16/UTF-8 conversions when writing to and reading from *.vdx files. At
least with the limited new testing that I've done.

I had originally tested using some Hebrew text interspersed with Latin text,
and I believe the right-to-left nature of the Hebrew text caused some
confusion (either in me, or in the various editors/debuggers/browsers that I
used to look at things) -- the results had been very strange indeed.

Sam

Samuel C. Yang said:
Or, an even better fix would be to have the COM Interop Assembly
implementations of the Formula(U) and SolutionXMLElement properties detect,
when appropriate, that a passed string is a legal <SolutionXML> element, and
automatically convert such string values to whatever encoding is appropriate
for the output *.vdx file. When assigning to a string from one of these
properties, the reverse conversion (to UTF-16) would also be done
automatically.

This requires no API changes at all, and makes the developer's task simple.
it
"XMLEncoding")
to to
use programmatic
way framework
(as
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top