Insert WordML inside HTML into Word?


Yves Dhondt


I have a HTML file with an XML file inside containing WordML. When I try to
insert the file, the xml seems to be stripped out of the file and nothing
from inside the XML gets inserted. However, normal HTML paragraphs do get

As I have no option but to use HTML, is there a way I can incorporate WordML
in there in a way that Word can actually process it?



Code to insert the HTML file:
ActiveDocument.Range.InsertFile "test.html"

Example HTML file:

<html xmlns="">
<xml id="SomeId">
<w:t xml:space="preserve">Hello</w:t>
<w:t xml:space="preserve"> </w:t>

Peter Jamieson

You could start with something like the following, but it does rely
completely on your Data Islands always having the same structure:

Sub GetFirstXmlIsland()
Const xmlline1 = "<?xml version=""1.0"" encoding=""UTF-8""
Const xmlline2 = "<?mso-application progid=""Word.Document""?>"
' the file to load
Const strFileToLoad = "c:\a\testhtml.htm"
' a temp file for extracted XML
Const strTempXMLFileName = "c:\a\tempxml.xml"
' May need this later
' Dim objDomElement As MSXML2.IXMLDOMElement
Dim objDOMDocument As MSXML2.DOMDocument
Dim objDOMError As MSXML2.IXMLDOMParseError
Dim objFSO As Scripting.FileSystemObject
Dim objTS As Scripting.TextStream
Set objDOMDocument = CreateObject("MSXML2.DomDocument")
objDOMDocument.async = False
objDOMDocument.Load strFileToLoad
If (objDOMDocument.parseError.ErrorCode <> 0) Then
Set objDOMError = objDOMDocument.parseError
MsgBox ("You have error " & objDOMError.reason)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTS = objFSO.OpenTextFile( _
FileName:=strTempXMLFileName, _
IOMode:=ForWriting, _
objTS.Write _
xmlline1 & vbCrLf & _
xmlline2 & vbCrLf & _
Set objTS = Nothing
Set objFSO = Nothing
Selection.Range.InsertFile _
FileName:=strTempXMLFileName, _
End If
Set objDOMDocument = Nothing
End Sub

You need to reference the MSXML and Scripting objects.

That's my first attempt so you know at least as much as I do now. As
long as your input files have a consistent structure there's no real
reason why you couldn't process the input without an MSXML object and
find and strip off the HTML stuff and the <xml> wrapper Element using
"traditional" code.

Peter Jamieson

Yves Dhondt

Thanks for the suggestion but it won't work in my case as I have no control
over the import process. It's a call to "InsertFile" so the HTML file can
not be preprocessed. If that were possible, I would just have used
"InsertXml" directly.


Peter Jamieson

Thanks for the suggestion but it won't work in my case as I have no
control over the import process. It's a call to "InsertFile" so the HTML
file can not be preprocessed. If that were possible, I would just have
used "InsertXml" directly.

Well, it doesn't sound as if you have much leeway, but...

if you have no control over the process then I know of no magic
parameters/options in InsertFile that would help. The only way I can
imagine doing this, given the constraints you mention, would be
a. you have to be using Word 2007 SP2
b. you create a new-style text converter (it has to be an
out-of-process COM object that implements the new converter interface)
c. you are able to associate your filename extension unambiguously
with that converter. I have a suspicion you wouldn't be able to do that,
even temporarily, if the extension is .htm/.html. AFAICS although it is
possible to specify a particular converter in an INCLUDETEXT /field/
there's no way to do it in the IncludeText method, even if you include
using a field.

AFAICS a new-style converter has to convert the source format to WordML,
so all you would need to do is strip the HTML and <xml> wrapper - if you
had to implement an old-style converter it would have to convert WordML
to RTF which wouldn't be my idea of fun.

Peter Jamieson

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question
