Thought I'd follow up on my original posting with our findings thus far...
On June 13, Microsoft released a critical patch (917336)
<
http://support.microsoft.com/kb/917334> to address a remote code execution
vulnerability in Word 2003. This patch also included, though unrelated to
the vulnerability, an update of the RTF specification:
<
http://support.microsoft.com/kb/922681/en-us?spid=2530>. One of the changes
in version 1.8 is the frequency at which Word inserts right-to-left char
keywords (far more frequently and even for text which does not include
R-to-L content).
Our testing shows that this change in behavior 'breaks' editing of Word 2003
created RTF with Word 2004, because while Word 2004 supports Unicode, it
does *not* support right-to-left scripts. These rampant 'rtlch' keywords,
when interpreted by Word 2004, manifest as the seemingly random application
of the 'hidden' character attribute, as well as potential substitution of
white space characters (space, paragraph marks, soft returns, etc.) with
unknown '??' characters.
So at this point, cross-platform editing of RTF _from_ Word 2003 _to_ Word
2004 is not possible until Microsoft updates the Word 2004 RTF importer to
support v1.8 of the specification. The only alternative is to use native DOC
file format as the cross-platform interchange format (which is ironic
considering that's the role RTF was supposed to fill).
As an aside, Word v.X is unaffected by the RTF v1.8 changes, possibly
_because_ it doesn't have native Unicode support.
And I've even gone as far as using the Remove Hidden Data tool in Word 2003
to export a 'final' version of the file yet still am experiencing problems
with 'hidden' attributes being applied seemingly at random. If it were just
happening to me (on my own system) I would discount it as a user
configuration error, but that it's happening for two of my associates _and_
for a large client... It's a problem.
To follow up on that second test... In delving further into the file, there
are apparently some passages of text which have the hidden character
attribute (hence the dotted underline and their disappearance when turning
off 'show hidden text', but when selecting the text and looking at the
character formatting 'Hidden' is _not_ applied (even selecting the text and
'clear formatting' doesn't remove the hidden attribute). Any ideas on that
too?
On 9/26/06 9:16 AM, in article (e-mail address removed), "Caleb Clauset"
I don't think it's the fonts. This happens for files where I have the same
OpenType (PostScript) font installed on both Mac/Win or when the file is
using the default Times/Times NR fonts. Track changes doesn't appear to be
it either as it hasn't been used on these documents (or if it has, there
are
no un-accepted changes).
As a simple test I created a new file in Word 2003 with two lines. The
first
is styled as "Header 1" with the last word styled with the "Emphasis"
character style (both styles are built-in). Save the file as RTF, copy to
my
Mac, open in Word 2004 and the file appears empty (except the status bar at
the bottom of the window estimates 14 characters in the file). Toggle show
hidden and the Header 1 line appears with the dotted underline indicating
it
has a 'hidden' character format.
As a second test, I took an existing file and reset all the formatting back
to the base (Times NR 12 pt, single space, left align, no indent, auto
color). Open that in Word 2003, save as RTF, copy to Mac, open in Word 2004
and nearly all content is visible except that three paragraph styles now
have 'hidden' as part of their formatting.
I'd be more than happy to provide you (or anyone else) with sample files.
It's definitely a major annoyance right now and we're desperate for a
workaround.
On 9/26/06 8:08 AM, in article (e-mail address removed), "John McGhie [MVP - Word
The first place I would look would be the Font in use.
Word 2004 supports Unicode fonts, Word X does not. However, both Mac
versions (and Word 2003) run in Unicode internally.
I would be surprised if it's "styled" text that is the problem. A "Style"
in Word is simply a binary pointer to a property table. If the style is
good in one part of the document, it will be good everywhere.
This really sounds like a character set issue, which is why I am wondering
about the fonts. I would also investigate to see if Tracked Changes have
been used during the editing process. If they have, low-level document
corruption is quite likely, unless the users know what they are doing.
There were a couple of Windows patches affecting the WMF and RTF
mechanisms. RTF, as you know, is an evolving standard. Word 2003 is a
level ahead, and thus capable of producing RTF objects that Word 2004
should ignore. But that's what it should do: ignore it, not mark it
"hidden".
I would carefully examine the hidden text to see exactly which styles are
in play (there may be several: one for the paragraph, a character style, a
table style etc...).
Word 2003 also creates "Linked Styles" if the 2003 user has left "Keep
Track of Formatting" enabled. Try turning that off
(Tools>Options>Edit...). Try round-tripping the document to HTML or XML
before saving to RTF. That will clean the internal structure up.
But the first place I would look is tracked changes: try "Accept All
Changes In Document" before saving to RTF.
On 26/9/06 9:36 PM, in article (e-mail address removed), "Caleb Clauset"
Just trying to figure out what's changed that is causing RTF files
created
in Word 2003 (Windows) to suddenly have arbitrary text take on the
'Hidden' character format when opening the files with Word 2004 (Mac).
Open the same file with Word v.X and no problems. We're running Word 2004
(11.2.3; 060202) and Word 2003 (SP2; 11.8026.8036).
We've also had a couple of files (again RTF) where not only has random
text become 'hidden' (huge chunks, not just individual characters), but
also have white space characters show up as '??' (white space being both
spaces and paragraph returns). Not all white space characters are
'corrupted', it tends to happen to text that has been extensively styled
(i.e., the para style is more than just adjusting the typeface, point
size, alignment, and/or basic line spacing).
If anyone has experienced either of these problems and knows how to fix
this, I'd be most grateful. Everything had been working fine before so
we're not sure what changed (and are hoping that it's something that we
can simply toggle on/off). And no, switching to .DOC is not an option, we
need to stick with RTF.