Character entities in Infopath XML

Nick Head · Oct 31, 2004

We've got some Word RTF docs that we are saving as HTML and then transforming
to IP-compatible XML for editing. Problem ocurrs when extended characters are
found in the XML file.

For example the character â‰¥ is exported by MS Word as â‰¥ when you save as
filtered HTML. Then when trying to open any document with a character like
this in IP I get the error:

"The form contains schema validation errrors - Reference to undefined entity
'ge'."

'Fair enough!' I thought and so added a DTD character entity reference to
the document so that it knew how to handle the character. My resulting XML
looks like this (with IP processing instructions and namespace references
removed for clarity):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE myFields [
<!ENTITY ge "â‰¥" >
]>
<my:myFields>
<my:legacyContent>
3 â‰¥ 2
</my:legacyContent>
</my:myFields>

However I still get the same issue. I tried opening this XML file in IE to
test it, and sure enough it displays perfectly with no validation errors.

Has anyone managed to successfully do this in IP before? Or does IP just not
handle DTD character entity references?

TIA
Nick

Matthew Blain \(Serriform\) · Oct 31, 2004

If you can, consider saving as WordML, though that may be harder to
transform.
Alternately, tidy can save out XHTML including using numeric Unicode
references instead of named entities.

I have no idea if InfoPath supports DTDs, perhaps someone else here does.

--Matthew Blain
http://tips.serriform.com/
http://www.developingsolutionswithinfopath.com/

Nick Head · Nov 1, 2004

Perfect! Thanks Matthew

Not technically an Infopath issue here at all but I'll outline the solution
for google:

First process the HTML using HTMLTidy with the 'numeric-entities' switch
turned on. Instead of generating named entities such as & ge; it outputs the
numeric version e.g. &# 2265.

These characters can now be read and edited within IP without requiring a
DTD reference.

As an aside, Infopath will read these numeric characters as single-byte
ASCII characters. However when you save or insert new special symbols it will
save the file as UTF8 so instead of taking up 6 bytes to represent a special
character, they only take up 2 e.g 0x65 0x22. But of course your file size
will double anyway as all the other characters will have taken on an extra
byte.

Cheers
Nick

Matthew Blain (Serriform) said:
If you can, consider saving as WordML, though that may be harder to
transform.
Alternately, tidy can save out XHTML including using numeric Unicode
references instead of named entities.

I have no idea if InfoPath supports DTDs, perhaps someone else here does.

--Matthew Blain
http://tips.serriform.com/
http://www.developingsolutionswithinfopath.com/

Nick Head said:

We've got some Word RTF docs that we are saving as HTML and then transforming
to IP-compatible XML for editing. Problem ocurrs when extended characters are
found in the XML file.

For example the character ? is exported by MS Word as ? when you save as
filtered HTML. Then when trying to open any document with a character like
this in IP I get the error:

"The form contains schema validation errrors - Reference to undefined entity
'ge'."

'Fair enough!' I thought and so added a DTD character entity reference to
the document so that it knew how to handle the character. My resulting XML
looks like this (with IP processing instructions and namespace references
removed for clarity):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE myFields [
<!ENTITY ge "?" >
]>
<my:myFields>
<my:legacyContent>
3 ? 2
</my:legacyContent>
</my:myFields>

However I still get the same issue. I tried opening this XML file in IE to
test it, and sure enough it displays perfectly with no validation errors.

Has anyone managed to successfully do this in IP before? Or does IP just not
handle DTD character entity references?

TIA
Nick

Click to expand...

XML special character export	0	Jan 27, 2010
Help needed in saving infopath form to xml column	1	Aug 21, 2008
Preserve XML whitespace in Excel 2003	0	Mar 31, 2010
How do I get infopath 2k7 to work with empty columns in XML?	0	Nov 6, 2006
SOAP-formatted XML in InfoPath?	0	Apr 22, 2005
Frontpage 2003 Corrupting HTML Entities	1	Jan 4, 2006
InfoPath 2007 - Retrieve a value from a form in task pane	4	Oct 5, 2007
Downloaded files not opening in Mac Excel 2008 (MIME/XML)	5	Jan 9, 2010

Character entities in Infopath XML

Nick Head

Matthew Blain \(Serriform\)

Nick Head

Ask a Question

Similar Threads