MS Word 2003 and DITA/DocBook

J

Jose Valdes

I use Word 2003 to write manuals and I am researching whether XML can help
me single-source some content between my manuals. By the way, Word 2007 is
not an option right now because it has not been approved by the IT
department of my company.

In "Office 2003 XML for Power User," Matthew MacDonald suggests "adopting an
industry-standard XML markup that is already defined for your data" instead
of "crafting your own XML markup." He mentions DocBook specifically which
seems up my alley since I write manuals. DITA, however, is the trendy
markup.

Any opinions as to which is better? Is it technically possible to use
DITA/DocBook in Word? For example, DocBook uses DTD, but Word seems to want
XSD. Is there any benefit from DITA/DocBook when used in Word? On the plus
side, DITA/DocBook provide markup tags and some standard style sheets. A
minus might be that these markups want you to apply these tags to every
element of these manuals such as headings, notes, warnings, and paragraphs;
if you use WordprocessML, Word applies the markup automatically for you.

Thanks!Jose

My apologies to the readers of "word.docmanagement" if this post isn't quite
a match to your group. I wanted to post to a group that is specific to Word,
but couldn't think of a better fit.
 
P

Peter Flynn

Jose said:
I use Word 2003 to write manuals and I am researching whether XML can help
me single-source some content between my manuals.

Yes, definitely.
By the way, Word 2007 is
not an option right now because it has not been approved by the IT
department of my company.

In "Office 2003 XML for Power User," Matthew MacDonald suggests "adopting an
industry-standard XML markup that is already defined for your data" instead
of "crafting your own XML markup." He mentions DocBook specifically which
seems up my alley since I write manuals. DITA, however, is the trendy
markup.

DocBook is the de facto standard document type for computing
documentation; DITA is more of an architecture, and is an OASIS standard.
Any opinions as to which is better?

Both do pretty much the same job: the choice is usually based on other
considerations like local organisational/IT support, industry sector
acceptance, cost of software/training, etc.
Is it technically possible to use DITA/DocBook in Word?
Yes.

For example, DocBook uses DTD, but Word seems to want XSD.

Word's XML interface really wasn't designed for text documents, but for
data, which is why they don't support DTDs (plus the original idea for
Schemas came partly from within Microsoft, so they don't feel any need
to support DTDs which are more commonly used in text document publishing).
Is there any benefit from DITA/DocBook when used in Word?

The "when used in Word" is redundant. There are lots of benefits to
using XML, whether DITA, DocBook, TEI, or whatever document type.
On the plus
side, DITA/DocBook provide markup tags and some standard style sheets. A
minus might be that these markups want you to apply these tags to every
element of these manuals such as headings, notes, warnings, and paragraphs;

Yes, that's what a document type is for. It specifies markup for certain
things that are important in a document: you are indeed expected to use
them. You have to call a heading a heading, a note a note, a warning a
warning, and so on. Calling a heading "14pt vertical space, no indent,
28pt on 32pt Arial Bold, 10pt vertical space" is pointless and
unproductive if you are trying to make a reusable and meaningful document.
if you use WordprocessML, Word applies the markup automatically for you.

You mean _guess_ what you are typing? How does it know I'm typing a note
and not a warning? :) This is NOT automatic markup: that's a different
thing entirely.

But the kind of markup it applies doesn't label the elements for what
they are, only for what they look like...as above. Even if you use smart
stylesheets and named styles, you'll still be using Word's internal
document model which is flat. It has no hierarchy, so it has no concept
of containment (section containing subsection containing subsubsection
containing warning containing heading-followed-by-paragraph). You can
kludge your way around it, but it'll still be a flat document, and will
need very substantial massaging to be reusable or addressable.

The associated problem is that Word's XML editor isn't a particularly
good one by comparison with the rest. There are others which can be
customised for the document type to do what you describe as "automatic
markup", and they may offer many other benefits for the technical writer
(as well as drawbacks like cost or the need for approval).

Personally, I use Emacs, but I'm well known for that kind of thing :)

///Peter
 
J

Jose Valdes

Peter,

Thanks so much for replying. I hope my fledgling XML questions have not
tested your patients. ;-)

Forgive my lack of clarity when I wrote, "when used in MS Word." I meant to
say that my completed manuals have to take the form of MS Word 2003 files. I
can use a document type such as DocBook or DITA, but the finished document
must be MS Word. My company is not interested in XML and I would have a hard
time making the case for investing in any additional tools such as
specialized editors.

I hope I am not being dense, but I'm having trouble understanding how I
could use DITA, DocBook, or another document type to produce Word files.
Scenario One: I could import a schema from DITA/DocBook into the Word Schema
Library, and use Word to apply XML elements to my existing manuals. Within
Word, I can perform XML transformations based on these schemas, which would
allow me to take advantage of single sourcing features. If I understand it
correctly, this scenario is possible for DITA, but not DocBook because the
latter uses DTD instead of XSD. Is that right?

Scenario Two: edit XML files using an XML editor, perform all
transformations outside of Word, and import final version of manuals into
Word. In this scenario, I would not really be using Word.

Scenario Three: use IncludeText fields in Word files to take advantage of
the XML transformation available in these fields. This scenario is very
similar to the first one, and would probably exclude DocBook for the same
reasons, I think. ;-)

Are any of these scenarios feasible or worthwhile? Am I overlooking a more
sensible scenario? For now, I think I am going to focus on scenario one and
read up on DITA.

Thanks!
Jose
 
P

Peter Flynn

Jose said:
Peter,

Thanks so much for replying. I hope my fledgling XML questions have not
tested your patients. ;-)

Forgive my lack of clarity when I wrote, "when used in MS Word." I meant to
say that my completed manuals have to take the form of MS Word 2003 files. I
can use a document type such as DocBook or DITA, but the finished document
must be MS Word. My company is not interested in XML and I would have a hard
time making the case for investing in any additional tools such as
specialized editors.

OK. You mean you'd have a hard time expecting *them* to invest in an XML
editor (cheaper than Word :). I assume you yourself would be using one:
I have a hard time imagining people editing XML text in Notepad...
I hope I am not being dense, but I'm having trouble understanding how
I could use DITA, DocBook, or another document type to produce Word
files.

This is a common path, and the simplest way is to cheat. Write your doc
in DocBook or DITA or whatever. Write an XSLT transformation to very
carefully constructed XHTML with whatever styling is needed embedded in
a style element in CSS in the header. Rename the output file to end in
..doc and Word will open it as if it were a native .doc file, and your
company will be none the wiser.

I have several clients doing this all the time. The individuals (being
professional documentation engineers) refuse to use Word because it
lacks the depth required, so this device enables them to continue Doing
It Right, but lets their users continue to believe in Word. Where the
client needs PDF, they generate it from the XML. Ditto with HTML.

The "official" path is to write a (10000x more complex) XSLT
transformation to turn your DocBook/DITA into OOXML. This is perfectly
possible, just highly error-prone because of the complexity involved
(OOXML is an XML representation of how a Word file *looks*, rather than
what it *means*).
Scenario One: I could import a schema from DITA/DocBook into the Word Schema
Library, and use Word to apply XML elements to my existing manuals. Within
Word, I can perform XML transformations based on these schemas, which would
allow me to take advantage of single sourcing features. If I understand it
correctly, this scenario is possible for DITA, but not DocBook because the
latter uses DTD instead of XSD. Is that right?

No. DocBook is defined and maintained in ISO RELAX NG, so it is
available in W3C Schema format as well as in DTD format. You could
continue editing DocBook documents using a DTD, and provided the
document is valid, remove the DOCTYPE Declaration and use the document
with the correspondng Schema.
Scenario Two: edit XML files using an XML editor, perform all
transformations outside of Word, and import final version of manuals into
Word. In this scenario, I would not really be using Word.

Right. That's what I'm suggesting above, except that if you open the
HTML file in Word, and Save As....doc or OOXML, it seems automagically
to become a Word document. But test it in their version first.
Scenario Three: use IncludeText fields in Word files to take advantage of
the XML transformation available in these fields. This scenario is very
similar to the first one, and would probably exclude DocBook for the same
reasons, I think. ;-)

Are any of these scenarios feasible or worthwhile? Am I overlooking a more
sensible scenario? For now, I think I am going to focus on scenario one and
read up on DITA.

Try the HTML route and see.

///Peter
 
J

Jose Valdes

Thanks Peter! I'll try the HTML route!

Peter Flynn said:
OK. You mean you'd have a hard time expecting *them* to invest in an XML
editor (cheaper than Word :). I assume you yourself would be using one: I
have a hard time imagining people editing XML text in Notepad...


This is a common path, and the simplest way is to cheat. Write your doc in
DocBook or DITA or whatever. Write an XSLT transformation to very
carefully constructed XHTML with whatever styling is needed embedded in a
style element in CSS in the header. Rename the output file to end in .doc
and Word will open it as if it were a native .doc file, and your company
will be none the wiser.

I have several clients doing this all the time. The individuals (being
professional documentation engineers) refuse to use Word because it lacks
the depth required, so this device enables them to continue Doing It
Right, but lets their users continue to believe in Word. Where the client
needs PDF, they generate it from the XML. Ditto with HTML.

The "official" path is to write a (10000x more complex) XSLT
transformation to turn your DocBook/DITA into OOXML. This is perfectly
possible, just highly error-prone because of the complexity involved
(OOXML is an XML representation of how a Word file *looks*, rather than
what it *means*).


No. DocBook is defined and maintained in ISO RELAX NG, so it is available
in W3C Schema format as well as in DTD format. You could continue editing
DocBook documents using a DTD, and provided the document is valid, remove
the DOCTYPE Declaration and use the document with the correspondng Schema.


Right. That's what I'm suggesting above, except that if you open the HTML
file in Word, and Save As....doc or OOXML, it seems automagically to
become a Word document. But test it in their version first.


Try the HTML route and see.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top