First line in rtf file and font

H

Hans L

I have a seemingly random and vexing problem. Sometimes, when I send an rtf
or doc file to a client (translations), the Swedish letters åäöÅÄÖ are
replaced by Asian-looking characters.

What usually happens is that "deff#" in the first line of the file is *not*
"deff0", but, e.g., "deff17" (deff# is the font, I understand). What I do
not understand is why this happens.

In the hope that someone with experience in these matters can provide a
clue, or even a solution, I will present below the first row of rtf files I
have looked at with UltraEdit (the text editor). There are four files of
each, and the file names explain who sent which file to whom.


Additional Text_1_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch13\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe2052{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Additional Text_2_Sent by me.rtf

{\rtf1\ansi\ansicpg1252\uc1
\deff17\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Additional Text_3_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Additional Text_4_Sent by me.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}


As you can see, the first file from the client to me had "deff0". The file
I sent back had "deff17", but the Swedish letters came out alright!!! They
were also okay in file 3 and 4 ("deff0" in both cases).



= = = = = = = = =


Extracted Text from Bioplate PDF_1_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch13\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe2052{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Extracted Text from Bioplate PDF_2_Sent by me.rtf

{\rtf1\ansi\ansicpg1252\uc1
\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Extracted Text from Bioplate PDF_3_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}


Extracted Text from Bioplate PDF_4_Sent by me.rtf (Swedish letter åäöÅÄÖ =
Asian characters)

{\rtf1\ansi\ansicpg1252\uc1
\deff17\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}


For this file, it was "deff0" for files 1-3, but file 4 had "deff17", and
the Swedish letters were screwed up.

- - - - -

I don't know if it is possible for anyone to figure out what makes "deff'"
switch from 0 to 17 (and sometimes to other numbers), and in one case having
the Swedish letters come out okay and in the other not. But perhaps there
are clues that will help someone figure out, at least in principle, what has
happened, which might help me figure out how to avoid the problem.

Thank you for your consideration.

Hans L
 
S

sam

I have a seemingly random and vexing problem. �Sometimes, when I send an rtf
or doc file to a client (translations), the Swedish letters ������ are
replaced by Asian-looking characters.

What usually happens is that "deff#" in the first line of the file is *not*
"deff0", but, e.g., "deff17" (deff# is the font, I understand). �What I do
not understand is why this happens.

In the hope that someone with experience in these matters can provide a
clue, or even a solution, I will present below the first row of rtf files I
have looked at with UltraEdit (the text editor). �There are four files of
each, and the file names explain who sent which file to whom.

Additional Text_1_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch13\stshfloch 0\stshfhich0\stshfbi0\deflang1033\deflangfe2052{\fonttbl{\f0\froman\fcharse t0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Additional Text_2_Sent by me.rtf

{\rtf1\ansi\ansicpg1252\uc1
\deff17\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\pa nose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Additional Text_3_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0 \stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset 0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Additional Text_4_Sent by me.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0 \stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset 0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

As you can see, the first file from the client to me had "deff0". �The file
I sent back had "deff17", but the Swedish letters came out alright!!! �They
were also okay in file 3 and 4 ("deff0" in both cases).

= = = = = = = = =

Extracted Text from Bioplate PDF_1_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch13\stshfloch 0\stshfhich0\stshfbi0\deflang1033\deflangfe2052{\fonttbl{\f0\froman\fcharse t0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Extracted Text from Bioplate PDF_2_Sent by me.rtf

{\rtf1\ansi\ansicpg1252\uc1
\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\pan ose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Extracted Text from Bioplate PDF_3_Sent by client.rtf

{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch0\stshfloch0 \stshfhich0\stshfbi0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset 0\fprq2{\*\panose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

Extracted Text from Bioplate PDF_4_Sent by me.rtf (Swedish letter ������ =
Asian characters)

{\rtf1\ansi\ansicpg1252\uc1
\deff17\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\pa nose
02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose
020b0604020202020204}Arial;}

For this file, it was "deff0" for files 1-3, but file 4 had "deff17", and
the Swedish letters were screwed up.

- - - - -

I don't know if it is possible for anyone to figure out what makes "deff'"
switch from 0 to 17 (and sometimes to other numbers), and in one case having
the Swedish letters come out okay and in the other not. �But perhaps there
are clues that will help someone figure out, at least in principle, what has
happened, which might help me figure out how to avoid the problem.

Thank you for your consideration.

Hans L

Seems to me like field code. Select the lines and press alt+f9 or
alternatively right click in the sentence and select Toggle Field
codes
 
H

Hans L

I meant to tell that I am using Word 2000 & Win XP Home. I do not know what
my client uses.

Also, I have searched the Net up and down, but cannot find any list over
what fonts deff0, deff1, deff2, etc. stand for.

Hans L
 
B

Bob Buckland ?:-\)

Hi Hans,

The deff<N> is the listing of default fonts used and matches to a font # in the font table \fonttbl in the document.

Each version of Word gets a bit of an upgrade to the RTF spec. In theory, RTF 'readers' (like Word) are to ignore RTF elements that
they don't know, but such is not always the case.

For Word 2000 the RTF spec is v1.6, the first Unicode enabled version, and Word 2007 is v1.9. The \deff attributes have been in
since version 1.0 (which the spec defined as being for use with
"Microsoft MS-DOS(R), Windows(tm), OS/2(R), and Apple(R) Macintosh(R) applications" <g>.
Unfotrunately since when showing the version # of RTF in the file, only the major revision is shown, all RTF documents start off
with
\RTF1 (all version 1 <g>).

You'll find copies of each of the RTF specifications and tips on working with RTF on http://technet.microsoft.com and
http://sourceforge.net

While from just the snippets you provided I'm not sure all of the data is there, but in your last example the default language
changed to U.S. English.

Peter Jamieson should be by shortly. He's pretty much a wizard at reading RTF and may spot something else.

===========
I meant to tell that I am using Word 2000 & Win XP Home. I do not know what
my client uses.

Also, I have searched the Net up and down, but cannot find any list over
what fonts deff0, deff1, deff2, etc. stand for.

Hans L>>
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 
H

Hans L

Bob, I was just going to try to find a way to get in touch with MS whey I saw
your message (I do not get e-mail notifications, no matter what I do :-(
Great to hear from you.

All rtf text above is the entire first line when I look at the rtf files
with UltraEdit (text editor).

I will check out http://technet.microsoft.com and
http://sourceforge.net, although I have already checked out
http://msdn2.microsoft.com/en-us/library/aa140301(office.10).aspx without
getting to much info (read: without understanding too much :)

I hope that you are right in that Peter Jamieson will come by. I am deeply
over my head here, but I need to understand why these things happen, because
it affects my livelihood.

Thank you again,

Hans L
 
B

Bob Buckland ?:-\)

Hi Hans,

To make things in your RTF file a bit easier to read in UltraEdit you may want to download the RTF WordList from
http://ultraedit.com and then paste the content into Wordlist.txt
(Advanced=>Configuration=>Syntax Highlighting [Open](using a copy of the file)

Then, open a backup copy of your RTF file and in UE use
Search=>Replace

Find What }{
Replace with }^p{
[those are curly braces in the example]

(use a backup copy as the added paragraph breaks will show up as new paragraphs in Word if you reopen the RTF file.

You may also want to find out what version of Word the person you're exchanging files with is using as well and what languages are
enabled in his version.


=========
Bob, I was just going to try to find a way to get in touch with MS whey I saw
your message (I do not get e-mail notifications, no matter what I do :-(
Great to hear from you.

All rtf text above is the entire first line when I look at the rtf files
with UltraEdit (text editor).

I will check out http://technet.microsoft.com and
http://sourceforge.net, although I have already checked out
http://msdn2.microsoft.com/en-us/library/aa140301(office.10).aspx without
getting to much info (read: without understanding too much :)

I hope that you are right in that Peter Jamieson will come by. I am deeply
over my head here, but I need to understand why these things happen, because
it affects my livelihood.

Thank you again,

Hans L>>
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 
H

Hans L

Bob Buckland ?:-) said:
Hi Hans,

To make things in your RTF file a bit easier to read in UltraEdit you may want to download the RTF WordList from
http://ultraedit.com and then paste the content into Wordlist.txt
(Advanced=>Configuration=>Syntax Highlighting [Open](using a copy of the file)

Then, open a backup copy of your RTF file and in UE use
Search=>Replace

Find What }{
Replace with }^p{
[those are curly braces in the example]

(use a backup copy as the added paragraph breaks will show up as new paragraphs in Word if you reopen the RTF file.

You may also want to find out what version of Word the person you're exchanging files with is using as well and what languages are
enabled in his version.


Bob Buckland ?:)
MS Office System Products MVP

Thanks for the advice, Bob Now, I have looked like crazy for RTF Wordlist
on the IDM site, but I cannot find it. Is it possibly called something else
or ...?

I did ask the client for what version of Word they used, but have gotten no
response yet. Enabled languages – hm, what is that going to tell me?

Regards,

Hans L
 
B

Bob Buckland ?:-\)

Hi Hans,

The RTF list format WordFile and taglists are on
http://ultraedit.com/index.php?name=Content&pa=showpage&pid=40

In some of your RTF file snippets there were multiple languages it appears in the content. The problem you're having with
international characters could come from applying the wrong language setting to specific text in Word.

=========
Thanks for the advice, Bob Now, I have looked like crazy for RTF Wordlist
on the IDM site, but I cannot find it. Is it possibly called something else
or ...?

I did ask the client for what version of Word they used, but have gotten no
response yet. Enabled languages - hm, what is that going to tell me?

Regards,

Hans L >>
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 
H

Hans L

Okay, I did all the above, and I now get all f1-f... on different lines.
Helps a lot !

Here is what I have:

In Additional Text_2_Sent by me.rtf. there is no "adeflang", while in all
other Additional Text_#_Sent by me.rtf files, there is an "adefflang0125". I
do not yet know what "adeflang" is. What is interesting is that i 2, "deff"
is "f17", which normally would have come out as SimSun, but in this case, it
did not!!! The Swedish letters are okay.

This is contrary to Extracted Text from Bioplate PDF_4_Sent by me.rtf
(Swedish letter åäöÅÄÖ = Asian characters), where "deff" is "f17" and the
Swedish letters are indeed SimSun characters.

The only clue I can see that might explain why I did not get SimSun in 2,
but in 4, is that in

1: deflangfe2052
2: deflangfe1033
3: deflangfe1033
4: deflangfe1033

In other words, when the client sent me a file with deflangfe2052, my return
file was okay, but when I got a deflangfe1033 back from the client, my return
file became SimSun. Have no idea if this makes sense, and I do not know what
deflange is (will check).

Regard,s

Hans L
 
B

Bob Buckland ?:-\)

Hi Hans,

Yes, that was what I was suspecting is that switching the language ID to U.S. English (1033) for displaying FarEast characters
(\deflangfe1033) might be what was contributing to you getting the incorrect results. Peter Jamieson has said he would be able to
take a look at this thread. He's more 'fluent' in RTF :) (2052 is People's Republic of China), but both choices are a bit at odds
to working with Swedish text characters (LCID 1053) <g>.

\adeflang is the bidirection (alternate direction) language choice for a document, which if I recall would be more likely to appear
in later versions of Word RTF than 2000. In your case \adeflang1025 would be for Arabic - Saudi Arabia. The appearance of some of
the language coding in the document may not mean that it was intentionally used in the document but that Complex Script (Right to
left) languages have been enabled in the copy of Word that was creating the document

====================
Okay, I did all the above, and I now get all f1-f... on different lines.
Helps a lot !

Here is what I have:

In Additional Text_2_Sent by me.rtf. there is no "adeflang", while in all
other Additional Text_#_Sent by me.rtf files, there is an "adefflang0125". I
do not yet know what "adeflang" is. What is interesting is that i 2, "deff"
is "f17", which normally would have come out as SimSun, but in this case, it
did not!!! The Swedish letters are okay.

This is contrary to Extracted Text from Bioplate PDF_4_Sent by me.rtf
(Swedish letter åäöÅÄÖ = Asian characters), where "deff" is "f17" and the
Swedish letters are indeed SimSun characters.

The only clue I can see that might explain why I did not get SimSun in 2,
but in 4, is that in

1: deflangfe2052
2: deflangfe1033
3: deflangfe1033
4: deflangfe1033

In other words, when the client sent me a file with deflangfe2052, my return
file was okay, but when I got a deflangfe1033 back from the client, my return
file became SimSun. Have no idea if this makes sense, and I do not know what
deflange is (will check).

Regard,s

Hans L>>
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 
H

Hans L

Well, Bob, I hope we are on to something. I'll hold off a little to give
Peter time to look at this post (I really hope he will have the time) before
I start trying to do anything to alleviate this problem.

Thanks for your help so far!

Hans L
 
P

Peter Jamieson

Hello Hans and Bob,

Sorry it took me so long - not Bob's fault! Nor am I quite the whizz with
RTF that you might have hoped :) Just a few initial thoughts...

Point one is that the font table starting \fonttbl simply assigns font
numbers to a number of fonts defined using certain characteristics. In the
remainder of the RTF document, the font defined as \deff17 will be
referenced as \f17 and so on. But there is nothing magical about "17"
itself, and the same font could be referenced by a different \deff number in
different documents. Word does set up a number of fonts in the \fonttbl by
default, and in practice they may well be invariant between different
instances of Word, but if you open a document, change it, and save it, there
is no reason why, in theory, Word might not completely reorganise the font
table.

So what does \deff17 in your

Additional Text_4_Sent by me.rtf

actually say?

Second, when a Windows program such as Word tries to use a font in Windows
using the "Windows GDI" (Graphics Device Interface) it selects a font based
on a number of criteria, and interestingly enough, the "Facename" (Arial,
Times New Roman) etc. is, or at least was, not the first in the list. This
dates back to the time before TrueType etc. when fonts were typically tied
to a very small pre-Unicode character set. I believe it searches using the
following sequence:
Character set
Pitch
Family (e.g. Decorative Modern, Roman, Script, Swiss)
Facename

I also see that in 3 out of the 4 files "sent by you" you have
{\rtf1\ansi\ansicpg1252\uc1
rather than
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1

\adeflang is AFAIK an RTF 1.9 (Word 2007) keyword that specifies the
"Default language ID for South Asian/Middle Eastern text in Word. The
default languages are determined by the current primary editing language and
the enabled editing languages (can be changed via Microsoft Office Language
Settings applet)." So I would guess that this keyword is only added if the
user is using Word 2007 or perhaps the compatibility pack. It could be that
you are using Word 2003 or saving as Word 2003 compatible format.

1025 actually specifies "Arabic (Saudi Arabia)" I think. However, I do not
know whether this setting will come into play unless text is marked
explicitly as being in a South Asian/Middle Eastern language, and off the
top of my head I can't tell you how that would be done in RTF. I think this
is more to do with the /human language/ being used than the script. \deflang
defines something similar for all text in the document marked as \plain: eg.
deflang1033 is English (U.S.), and \deflangfe does a similar job for East
Asian text - e.g. \deflangfe2052 is Chinese (PRC). I suppose it could be
significant that one of the files sent to you has \deflangfe2052 and others
have \deflangfe1033.
 
H

Hans L

Peter, I am very sorry that it took ME so long to get back here. I cannot
get notification to work consistently, not even when I use a newsreader
(XanaNews) (although in this case, I used the web interface). I do not know
how others remember what they post when they get no notification.

I will print your post and go through it carefully, and then get back.

Again, sorry for my lateness.

Regards,

Hans L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top