Insert WordML inside HTML into Word?

Y

Yves Dhondt

Hello,

I have a HTML file with an XML file inside containing WordML. When I try to
insert the file, the xml seems to be stripped out of the file and nothing
from inside the XML gets inserted. However, normal HTML paragraphs do get
inserted.

As I have no option but to use HTML, is there a way I can incorporate WordML
in there in a way that Word can actually process it?

TIA,

Yves

Code to insert the HTML file:
ActiveDocument.Range.InsertFile "test.html"

Example HTML file:

<html xmlns="http://www.w3.org/TR/REC-html40">
<head></head>
<body>
<xml id="SomeId">
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<w:p>
<w:pPr/>
<w:r>
<w:rPr>
<w:u/>
</w:rPr>
<w:t xml:space="preserve">Hello</w:t>
</w:r>
<w:r>
<w:rPr/>
<w:t xml:space="preserve"> </w:t>
</w:r>
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:t>World</w:t>
</w:r>
<w:r>
<w:rPr/>
<w:t>!</w:t>
</w:r>
</w:p>
</w:wordDocument>
</xml>
</body>
</html>
 
P

Peter Jamieson

You could start with something like the following, but it does rely
completely on your Data Islands always having the same structure:

Sub GetFirstXmlIsland()
Const xmlline1 = "<?xml version=""1.0"" encoding=""UTF-8""
standalone=""yes""?>"
Const xmlline2 = "<?mso-application progid=""Word.Document""?>"
' the file to load
Const strFileToLoad = "c:\a\testhtml.htm"
' a temp file for extracted XML
Const strTempXMLFileName = "c:\a\tempxml.xml"
' May need this later
' Dim objDomElement As MSXML2.IXMLDOMElement
Dim objDOMDocument As MSXML2.DOMDocument
Dim objDOMError As MSXML2.IXMLDOMParseError
Dim objFSO As Scripting.FileSystemObject
Dim objTS As Scripting.TextStream
Set objDOMDocument = CreateObject("MSXML2.DomDocument")
objDOMDocument.async = False
objDOMDocument.Load strFileToLoad
If (objDOMDocument.parseError.ErrorCode <> 0) Then
Set objDOMError = objDOMDocument.parseError
MsgBox ("You have error " & objDOMError.reason)
Else
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTS = objFSO.OpenTextFile( _
FileName:=strTempXMLFileName, _
IOMode:=ForWriting, _
Create:=True)
objTS.Write _
xmlline1 & vbCrLf & _
xmlline2 & vbCrLf & _
objDOMDocument.DocumentElement.FirstChild.CloneNode(deep:=True).XML
objTS.Close
Set objTS = Nothing
Set objFSO = Nothing
Selection.Range.InsertFile _
FileName:=strTempXMLFileName, _
ConfirmConversions:=False
End If
Set objDOMDocument = Nothing
End Sub

You need to reference the MSXML and Scripting objects.

That's my first attempt so you know at least as much as I do now. As
long as your input files have a consistent structure there's no real
reason why you couldn't process the input without an MSXML object and
find and strip off the HTML stuff and the <xml> wrapper Element using
"traditional" code.

Peter Jamieson

http://tips.pjmsn.me.uk
 
Y

Yves Dhondt

Thanks for the suggestion but it won't work in my case as I have no control
over the import process. It's a call to "InsertFile" so the HTML file can
not be preprocessed. If that were possible, I would just have used
"InsertXml" directly.

Yves
 
P

Peter Jamieson

Thanks for the suggestion but it won't work in my case as I have no
control over the import process. It's a call to "InsertFile" so the HTML
file can not be preprocessed. If that were possible, I would just have
used "InsertXml" directly.

Well, it doesn't sound as if you have much leeway, but...

if you have no control over the process then I know of no magic
parameters/options in InsertFile that would help. The only way I can
imagine doing this, given the constraints you mention, would be
a. you have to be using Word 2007 SP2
b. you create a new-style text converter (it has to be an
out-of-process COM object that implements the new converter interface)
c. you are able to associate your filename extension unambiguously
with that converter. I have a suspicion you wouldn't be able to do that,
even temporarily, if the extension is .htm/.html. AFAICS although it is
possible to specify a particular converter in an INCLUDETEXT /field/
there's no way to do it in the IncludeText method, even if you include
using a field.

AFAICS a new-style converter has to convert the source format to WordML,
so all you would need to do is strip the HTML and <xml> wrapper - if you
had to implement an old-style converter it would have to convert WordML
to RTF which wouldn't be my idea of fun.

Peter Jamieson

http://tips.pjmsn.me.uk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top