problem importing UTF-8 txt file

B

BillSeitz

Version: 2008
Operating System: Mac OS X 10.5 (Leopard)
Processor: intel

I have a txt file prepared by a vendor via an automated process. It's delivered using UTF-8 character-set (accented characters, curly quotes, etc.).

I'm able to open these documents fine in Word on Windows.

I'm also able to open them find on the Mac with OpenOffice or TextEdit.

But I cannot seem to find an option for opening such documents in Word-2008 that doesn't result in some sort of munging of non-ascii characters.

A co-worker has the same issue in Mac-Word-2004.

I chanced upon the following workaround:
* open in TextEdit
* SaveAs to UTF-16 format
* open in Word, accepting its initial Unicode choice of "Unicode 5.0 (Little-Endian)". It now looks good.
* save as UTF-8.

I'd rather not ask people to remember these steps, but I hope they give some hint as to what's going on.
 
T

thg

There are two kinds of UTF-8, with and without the BOM (Byte Order Mark) at the beginning of the file. MS products require the BOM in order to recognize UTF-8, and the UTF-8 they produce has a BOM. Apple and probably most other products produce UTF-8 without a BOM.

Technically no BOM is required for UTF-8, but MS has adopted the convention of using its presence to distinguish that encoding from the OS default.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top