The mystery of the "Convert File" dialog box and em dashes

K

Kevin_Stauffer

Version: 2008
Operating System: Mac OS X 10.5 (Leopard)
Processor: Intel

I have two .docx files that I've converted to plain text files. When you try to open the text files, one always brings up the "Convert File" dialog box; the other does not. There does not appear to be any difference in the two files.

The real issue is that the one that brings up the "Convert File" dialog box (call it "Doc 1") displays em dashes correctly. The other file ("Doc 2") does not. Instead, it displays them as a hyphen. I need the text-only files to show em dashes as em dashes.

A little more background: I have an AppleScript that "massages" .docx files, which includes converting them to plain text before placing the files into an InDesign document. Along the way, it converts two hyphens to an em dash (I know that Word has a preference for this as well, but our Word documents come from several different writers).

Interestingly, I did one test where I pasted a sentence that contained an em dash from Doc 2 into Doc 1. When I converted this file via AppleScript to plain text, the "Convert File" dialog box appeared and, subsequently, the em dash appeared as an em dash. So there must be some hidden difference between Doc 1 and Doc 2 ... I just can't find what it is.

Any help will be greatly appreciated.

Thanks,
Kevin
 
J

John McGhie

Hi Kevin:

Have a look in a Hex editor: I suspect you'll find that one of those files
is in Unicode, the other in MacRoman :)

Cheers


Version: 2008
Operating System: Mac OS X 10.5 (Leopard)
Processor: Intel

I have two .docx files that I've converted to plain text files. When you try
to open the text files, one always brings up the "Convert File" dialog box;
the other does not. There does not appear to be any difference in the two
files.

The real issue is that the one that brings up the "Convert File" dialog box
(call it "Doc 1") displays em dashes correctly. The other file ("Doc 2") does
not. Instead, it displays them as a hyphen. I need the text-only files to show
em dashes as em dashes.

A little more background: I have an AppleScript that "massages" .docx files,
which includes converting them to plain text before placing the files into an
InDesign document. Along the way, it converts two hyphens to an em dash (I
know that Word has a preference for this as well, but our Word documents come
from several different writers).

Interestingly, I did one test where I pasted a sentence that contained an em
dash from Doc 2 into Doc 1. When I converted this file via AppleScript to
plain text, the "Convert File" dialog box appeared and, subsequently, the em
dash appeared as an em dash. So there must be some hidden difference between
Doc 1 and Doc 2 ... I just can't find what it is.

Any help will be greatly appreciated.

Thanks,
Kevin

--
Don't wait for your answer, click here: http://www.word.mvps.org/

Please reply in the group. Please do NOT email me unless I ask you to.

John McGhie, Microsoft MVP, Word and Word:Mac
Sydney, Australia. mailto:[email protected]
 
K

Kevin_Stauffer

Hi John

Thanks for the response. What you are saying makes sense, although I must admit that I could not figure out how to get the Hex editors I downloaded to reveal the file format.

However, I did resolve the problem by tweaking my AppleScript code. I added a line of code that converts all em dashes to two hyphens (--); then, after doing a "save as" plain text, the script converts all "--" to em dashes. This makes the em dash appear in all the plain text files, which is what I was after.

Thanks again!
Kevin
 
J

John McGhie

MacRoman characters are eight bits long: 0 to 255.

Unicode characters are 16 bits long: 0 to 32,768 :)

Cheers


Hi John

Thanks for the response. What you are saying makes sense, although I must
admit that I could not figure out how to get the Hex editors I downloaded to
reveal the file format.

However, I did resolve the problem by tweaking my AppleScript code. I added a
line of code that converts all em dashes to two hyphens (--); then, after
doing a "save as" plain text, the script converts all "--" to em dashes. This
makes the em dash appear in all the plain text files, which is what I was
after.

Thanks again!
Kevin

This email is my business email -- Please do not email me about forum
matters unless you intend to pay!

--

John McGhie, Microsoft MVP (Word, Mac Word), Consultant Technical Writer,
McGhie Information Engineering Pty Ltd
Sydney, Australia. | Ph: +61 (0)4 1209 1410
+61 4 1209 1410, mailto:[email protected]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top