XML tagging

J

Jorge

Hi

I'm having problems to format a Word document in XML: it seems that I cannot
freely decide where the elements of the XML Schema the document is attached
to start and end.

This is the XML Schema that I built and that is attached to the Word document:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2005 rel. 3 U (http://www.altova.com) by Jorge
(None) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="tree">
<xs:complexType>
<xs:sequence>
<xs:element ref="branch" minOccurs="1" maxOccurs="unbounded">
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="branch">
<xs:complexType>
<xs:sequence>
<xs:element name="leaf" type="xs:string" minOccurs="1"
maxOccurs="unbounded">
</xs:element>
<xs:element ref="branch" minOccurs="0" maxOccurs="unbounded">
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

These are the contents of my document:

text text<line break>
texttext text text<line break>
<line break>
text texttext text text<line break>
<line break>
text<line break>

I first applied the XML element "tree" to the whole text and it worked fine.
Then I tried to add the XML element "branch" to the first 3 lines and the
first 4 characters of the 4th line:

text text<line break>
texttext text text<line break>
<line break>
text

It didn't allow me to do this. Instead it just included the first 3 lines
and forgot about the 4 characters of the 4th line.
If I replace the line breaks for "soft line breaks", it works fine.

I guess there could be a conflict with WordML or something.

Thanks in advance.
 
W

Word Heretic

G'day "Jorge" <[email protected]>,

You would be attempting to make this structure:

<p> blah blah <JorgesMark> yada yada </p>
<p> urble gurble </JorgesMark> burble fluff </p>

this is not NESTED, this is INTERLEAVED and the para is the major
object of a doc.

Steve Hudson - Word Heretic

steve from wordheretic.com (Email replies require payment)
Without prejudice


Jorge reckoned:
 
J

Jorge

I found out about that yesterday in the afternoon and since then I'm trying
to bring up new ideas about how to add XML elements to my document respecting
the hierarchical structure of my XML Schema and WordML. I haven't succeeded
so far though. It's a big shortcoming when formatting regular text documents
in XML.
Any suggestions?

Jorge
 
W

Word Heretic

G'day "Jorge" <[email protected]>,

Only a few dodgy ones lol.

Dodgy XML - FnR
==============

When one para is followed by another of the same style, replace the
para mark with 2 manual line breaks. This will increase the allowable
incidences of the type you describe.


Dodgy XML - Custom export
======================

Forget WordML and thus all the auto-lovely fields and round-tripping
and custom export your content directly as structured XML. Lose the
whole 'para' object, and work off text chunks whilst the para style
name remains constant. If you want to include round-tripping we simply
tag up all the PARA PROPERTIES as unpaired tags before an unpaired
break to mark end of para.

Overly simplified example:

<Tree>
A Collection of Heretical Ravings<style:=Title><br>
<Branch>Word is like a dirty bum. Paper put near it gets covered in
the proverbial.<style:=Body Text><br>
However, Word is also like building a pyramid. </branch> If you lay
the foundations right, you can reach for the sky.</tree>



Dodgy XML - XSLT
===============

Embed bookmark pairs and use post-processing (a clever XSLT will do
this for you) to kill the para objects and translate your bookmarks
into tags.

Eg bookmark names:

Start Tag: _myStem_TagName_S_1
End Tag: _myStem_TagName_E_1

The first _ is to make these bookmarks 'hidden'
The second marks the end of myStem, which you use in the XSLT to id
bookmarks that need translation, the next separates The third marks
the end of the tagname that will be embedded and then the fourth can
be read to produce a null or a tag precession of \ to alter the on and
off status. The final underscore and numeric is thrown away in
translation so you can add numerous instances of the same tag.

GL, and let me know how you get on. I am crawling up the nostrils of a
guy responsible for our XML capabilities and the more juice you can
give me the better.


Steve Hudson - Word Heretic

steve from wordheretic.com (Email replies require payment)
Without prejudice


Jorge reckoned:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top