Fast conversion HTML entities to chars. How?

R

Robix

Hi,

I have a text file containing HTML entities (both named and numbered). I
need to convert them to characters they represent before the file goes to
translation and then convert characters back to their HTML entities after
translation. Does anyone know the good (fast) method to do this?

Also I'd like to ask this. It is a LISP file and when I open it in the text
editor it shows it has ANSI Mac encoding. As I want to process it on
Windows, can some problems arise during file processing (I mean characters
will corrupt, line endings will change, etc.)? (I am writing macro for this
in Word 2003). I have no experience with processing Mac files on Windows. (I
need to mention that after translation the file will contain texts in
several languages - Western European, Central European and Turkish).

Thank you very much in advance,
Robert
 
K

Klaus Linke

(Sorry for the long post)

Hi Robix,

That's quite a bit you're asking ;-)

You could save as HTML to turn characters into numerical character references, or add <html>...</html> tags at the start/end of the text file and then open as HTML to turn numerical character references back to text. But it may be difficult to figure out which encoding to use, or to strip out the HTML tags that Word inserts.

For toggling a file back and forth between numerical character references and Unicode characters, I have some macros, Numrefs2Unichars and Unichars2Numrefs, that I'll post below.

One problem are symbols from symbol fonts. Your files don't contain any, so you might comment out the call to SymbolsUnprotect.

Another problem is keeping the formatting. That's not a problem in your case either, and I have some macros that work on unformatted text and are faster than those posted below.
But since the macros for formatted text are more general and aren't terribly slow, I'll post those.

As you guessed, if the files come from the Mac, that's another problem.
I usually open (AppleRoman) text files from the Mac with another macro (ImportMacBinary) that changes the problematic characters directly into Unicode numerical character references.
You'll need to paste or type the path and file name into the macro, or add an interface with a file picker.
If you have access to a Mac, you could also open the text file in a Mac browser and copy/paste from there. Or save as HTML, doc, or RTF on the Mac. But all these methods are very likely to mess up a couple of characters, and so I prefer the macro since that gives me full control.

If you use Central European and Turkish characters, you should make sure that the final recipient on the Mac is using Word2004. Earlier versions may not be able to handle the necessary Unicode characters, and you may need to convert to the proper old Mac code page, using special fonts...

I hope the macros work for you,
Klaus



Option Explicit

Sub Numref2Unichar()
'
Dim start
Dim var$, Var2$, VarVal, VarReplace$
Top:
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "&#x[0-9a-fA-F]@;"
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If Selection.Find.Execute Then
var$ = Selection.Text
Var2$ = Left$(var$, Len(var$) - 1)
Var2$ = "&H" & Mid$(Var2$, 4)

VarVal = CLng(Val(Var2$))
VarReplace$ = ChrW(VarVal)
If VarVal = 94 Then
VarReplace$ = VarReplace$ & "^"
End If
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = var$
.Replacement.Text = VarReplace$
.MatchCase = True
.MatchWildcards = False
Select Case Val(Var2$)
Case &HF000 To &HF0FF
.Replacement.Font.Name = "Symbol"
Case Else
End Select
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.TypeText Text:= _
Right$(Var2$, Len(Var2$) - 2) & vbTab & _
VarReplace$
Else
Numref2Unichar2
Exit Sub
End If
GoTo Top
MsgBox Timer - start
End Sub


Private Sub Numref2Unichar2()
'
Dim var$, Var2$, VarVal, VarReplace$
Top:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "&#[0-9]@;"
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If Selection.Find.Execute Then
var$ = Selection.Text
Var2$ = Left$(var$, Len(var$) - 1)
Var2$ = Mid$(Var2$, 3)
VarVal = CLng(Val(Var2$))
VarReplace$ = ChrW(VarVal)
With Selection.Find
.Text = var$
.Replacement.Text = VarReplace$
.MatchCase = True
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.TypeText Text:=Var2$ & vbTab & _
VarReplace$
Else
Exit Sub
End If
GoTo Top
End Sub


Sub Unichar2Numref()
'
Dim myRange As Range
Dim Cancel, c, Flag, Arial
Dim Code As Long
Dim MsgText As String
Dim Tag$, Ausgabe$, txt$
Call SymbolsUnprotect
Cancel = 0
' Word-Unicode in Tags
' First, combining diacritical marks (bug!!)
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "(?)[" & ChrW(&H300) & "-" & _
ChrW(&H361) & "]"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
Selection.Text = Left(Selection.Text, 1) _
& ";&#x0" & Hex(AscW(Right(Selection.Text, 1))) & ";"
' Für Ausgabe am Ende der Datei:
Ausgabe$ = Right(Selection.Text, 5)
Ausgabe$ = Left(Ausgabe$, 4)
If InStr(1, txt, Ausgabe$) > 0 Then
Else
txt$ = txt$ + Ausgabe$ + vbCrLf
End If
Selection.Collapse (wdCollapseEnd)
Wend
'
Set myRange = ActiveDocument.Content
For Each c In myRange.Characters
' c.Select
Arial = 0
Code = Int2Long(AscW(c))
' Welche Zeichen sollen getaggt werden?
Flag = 0
If Code < 0 Then
Flag = 1
End If
If Code > 127 Then
' If Code < 191 Then
Flag = 1
' End If
End If
If Code > 126 Then
Flag = 1
' Decorative fonts?
If Code > Int2Long(&HF000) Then
If Code < Int2Long(&HF100) Then
If c.Font.Name <> "Arial" Then
Cancel = 0
End If
If Cancel <> vbOK Then
c.Select
MsgText = "Doc has characters from old symbol font " & _
c.Font.Name & vbCrLf
MsgText = MsgText & "Tag them in block U+F000 and continue?"
Cancel = MsgBox(MsgText, vbOKCancel + vbQuestion, "Options:")
If Cancel = vbCancel Then
End
Else
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Font.Name = c.Font.Name
Selection.Find.Replacement.Font.Name = "Arial"
With Selection.Find
.Text = ""
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End If
End If
Arial = 1
End If
End If
End If
If (Flag = 1) Then
' Zusammenbasteln des Tags:
Tag$ = Trim(Hex(Code))
While Len(Tag$) < 4
Tag$ = "0" + Tag$
Wend
Ausgabe$ = Tag$
Tag$ = "&#x" + Tag$ + ";"
' Alle Zeichen mit diesem Code in Tags ersetzen:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ChrW(Code)
.Replacement.Text = Tag$
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute _
Replace:=wdReplaceAll
' Für Ausgabe am Ende der Datei:
txt$ = txt$ & Ausgabe$ & vbTab & _
ChrW(Val("&H" & Ausgabe$)) & vbCr
End If
Flag = 0
Next c
ActiveDocument.Content.Select
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.Font.Name = "Arial"
Selection.TypeText ("Unicode characters:")
Selection.TypeParagraph
Selection.InsertAfter txt$
Selection.Sort ExcludeHeader:=False, _
SortFieldType:=wdSortFieldAlphanumeric, _
SortOrder:=wdSortOrderAscending
End Sub

Private Function Int2Long(VarInt As Integer) As Long
Int2Long = VarInt
If Int2Long < 0 Then
Int2Long = Int2Long + 65536
End If
End Function

Private Sub SymbolsUnprotect()
'
Dim SelFont, SelCharNum

Selection.Collapse (wdCollapseStart)
Selection.Find.ClearFormatting
With Selection.Find
.Text = "[" & ChrW(61472) & "-" & _
ChrW(61695) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
With Dialogs(wdDialogInsertSymbol)
SelFont = .Font
SelCharNum = .CharNum
End With

Selection.Font.Name = SelFont
Selection.TypeText Text:=ChrW(SelCharNum)

' replace the last 2 lines with the following
' to protect symbols from decorative fonts:
' Selection.InsertSymbol _
' Font:=SelFont, _
' CharacterNumber:=SelCharNum, _
' Unicode:=True

Wend
End Sub



Sub ImportMacBinary()

Dim sBuffer As String

Dim iFile As Integer

Dim i As Integer

Dim LngLOF As Long

iFile = FreeFile

StatusBar = "Set up buffer..."

sBuffer = String(2000000, "0")

' Documents.Add

Open "S:\Os2\typotext\FD012005.txt" For Binary As #iFile

StatusBar = "read file..."

LngLOF = LOF(iFile)

Do

If LngLOF < 1000000 Then

sBuffer = Input(LngLOF, #iFile)

LngLOF = 0

Else

sBuffer = Input(1000000, #iFile)

LngLOF = LngLOF - 1000000

End If

StatusBar = LngLOF

' Kontrollzeichen

For i = 0 To 8

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

For i = 10 To 12

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

For i = 14 To 31

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

StatusBar = "convert from Mac..."

' Apple Roman nach Unicode

sBuffer = Replace(sBuffer, Chr(128), "Ä", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(129), "Å", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH RING ABOVE

sBuffer = Replace(sBuffer, Chr(130), "Ç", , , vbBinaryCompare)

' LATIN CAPITAL LETTER C WITH CEDILLA

sBuffer = Replace(sBuffer, Chr(131), "É", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH ACUTE

sBuffer = Replace(sBuffer, Chr(132), "Ñ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER N WITH TILDE

sBuffer = Replace(sBuffer, Chr(133), "Ö", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(134), "Ü", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(135), "á", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH ACUTE

sBuffer = Replace(sBuffer, Chr(136), "à", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH GRAVE

sBuffer = Replace(sBuffer, Chr(137), "â", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(138), "ä", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(139), "ã", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH TILDE

sBuffer = Replace(sBuffer, Chr(140), "å", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH RING ABOVE

sBuffer = Replace(sBuffer, Chr(141), "ç", , , vbBinaryCompare)

' LATIN SMALL LETTER C WITH CEDILLA

sBuffer = Replace(sBuffer, Chr(142), "é", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH ACUTE

sBuffer = Replace(sBuffer, Chr(143), "è", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH GRAVE

sBuffer = Replace(sBuffer, Chr(144), "ê", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(145), "ë", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(146), "í", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH ACUTE

sBuffer = Replace(sBuffer, Chr(147), "ì", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH GRAVE

sBuffer = Replace(sBuffer, Chr(148), "î", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(149), "ï", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(150), "ñ", , , vbBinaryCompare)

' LATIN SMALL LETTER N WITH TILDE

sBuffer = Replace(sBuffer, Chr(151), "ó", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH ACUTE

sBuffer = Replace(sBuffer, Chr(152), "ò", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH GRAVE

sBuffer = Replace(sBuffer, Chr(153), "ô", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(154), "ö", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(155), "õ", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH TILDE

sBuffer = Replace(sBuffer, Chr(156), "ú", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH ACUTE

sBuffer = Replace(sBuffer, Chr(157), "ù", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH GRAVE

sBuffer = Replace(sBuffer, Chr(158), "û", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(159), "ü", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(160), "†", , , vbBinaryCompare)

' DAGGER

sBuffer = Replace(sBuffer, ChrW(161), "°", , , vbBinaryCompare)

' DEGREE SIGN

sBuffer = Replace(sBuffer, ChrW(162), "¢", , , vbBinaryCompare)

' CENT SIGN

sBuffer = Replace(sBuffer, ChrW(163), "£", , , vbBinaryCompare)

' POUND SIGN

sBuffer = Replace(sBuffer, ChrW(164), "§", , , vbBinaryCompare)

' SECTION SIGN

sBuffer = Replace(sBuffer, ChrW(165), "•", , , vbBinaryCompare)

' BULLET

sBuffer = Replace(sBuffer, ChrW(166), "¶", , , vbBinaryCompare)

' PILCROW SIGN

sBuffer = Replace(sBuffer, ChrW(167), "ß", , , vbBinaryCompare)

' LATIN SMALL LETTER SHARP S

sBuffer = Replace(sBuffer, ChrW(168), "®", , , vbBinaryCompare)

' REGISTERED SIGN

sBuffer = Replace(sBuffer, ChrW(169), "©", , , vbBinaryCompare)

' COPYRIGHT SIGN

sBuffer = Replace(sBuffer, ChrW(170), "™", , , vbBinaryCompare)

' TRADE MARK SIGN

sBuffer = Replace(sBuffer, ChrW(171), "´", , , vbBinaryCompare)

' ACUTE ACCENT

sBuffer = Replace(sBuffer, ChrW(172), "¨", , , vbBinaryCompare)

' DIAERESIS

sBuffer = Replace(sBuffer, ChrW(173), "≠", , , vbBinaryCompare)

' NOT EQUAL TO

sBuffer = Replace(sBuffer, ChrW(174), "Æ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER AE

sBuffer = Replace(sBuffer, ChrW(175), "Ø", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH STROKE

sBuffer = Replace(sBuffer, ChrW(176), "∞", , , vbBinaryCompare)

' INFINITY

sBuffer = Replace(sBuffer, ChrW(177), "±", , , vbBinaryCompare)

' PLUS-MINUS SIGN

sBuffer = Replace(sBuffer, ChrW(178), "≤", , , vbBinaryCompare)

' LESS-THAN OR EQUAL TO

sBuffer = Replace(sBuffer, ChrW(179), "≥", , , vbBinaryCompare)

' GREATER-THAN OR EQUAL TO

sBuffer = Replace(sBuffer, ChrW(180), "¥", , , vbBinaryCompare)

' YEN SIGN

sBuffer = Replace(sBuffer, ChrW(181), "µ", , , vbBinaryCompare)

' MICRO SIGN

sBuffer = Replace(sBuffer, ChrW(182), "∂", , , vbBinaryCompare)

' PARTIAL DIFFERENTIAL

sBuffer = Replace(sBuffer, ChrW(183), "∑", , , vbBinaryCompare)

' N-ARY SUMMATION

sBuffer = Replace(sBuffer, ChrW(184), "∏", , , vbBinaryCompare)

' N-ARY PRODUCT

sBuffer = Replace(sBuffer, ChrW(185), "π", , , vbBinaryCompare)

' GREEK SMALL LETTER PI

sBuffer = Replace(sBuffer, ChrW(186), "∫", , , vbBinaryCompare)

' INTEGRAL

sBuffer = Replace(sBuffer, ChrW(187), "ª", , , vbBinaryCompare)

' FEMININE ORDINAL INDICATOR

sBuffer = Replace(sBuffer, ChrW(188), "º", , , vbBinaryCompare)

' MASCULINE ORDINAL INDICATOR

sBuffer = Replace(sBuffer, ChrW(189), "Ω", , , vbBinaryCompare)

' GREEK CAPITAL LETTER OMEGA

sBuffer = Replace(sBuffer, ChrW(190), "æ", , , vbBinaryCompare)

' LATIN SMALL LETTER AE

sBuffer = Replace(sBuffer, ChrW(191), "ø", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH STROKE

sBuffer = Replace(sBuffer, ChrW(192), "¿", , , vbBinaryCompare)

' INVERTED QUESTION MARK

sBuffer = Replace(sBuffer, ChrW(193), "¡", , , vbBinaryCompare)

' INVERTED EXCLAMATION MARK

sBuffer = Replace(sBuffer, ChrW(194), "¬", , , vbBinaryCompare)

' NOT SIGN

sBuffer = Replace(sBuffer, ChrW(195), "√", , , vbBinaryCompare)

' SQUARE ROOT

sBuffer = Replace(sBuffer, ChrW(196), "ƒ", , , vbBinaryCompare)

' LATIN SMALL LETTER F WITH HOOK

sBuffer = Replace(sBuffer, ChrW(197), "≈", , , vbBinaryCompare)

' ALMOST EQUAL TO

sBuffer = Replace(sBuffer, ChrW(198), "∆", , , vbBinaryCompare)

' INCREMENT

sBuffer = Replace(sBuffer, ChrW(199), "«", , , vbBinaryCompare)

' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(200), "»", , , vbBinaryCompare)

' RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(201), "…", , , vbBinaryCompare)

' HORIZONTAL ELLIPSIS

sBuffer = Replace(sBuffer, ChrW(202), " ", , , vbBinaryCompare)

' NO-BREAK SPACE

sBuffer = Replace(sBuffer, ChrW(203), "À", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(204), "Ã", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH TILDE

sBuffer = Replace(sBuffer, ChrW(205), "Õ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH TILDE

sBuffer = Replace(sBuffer, ChrW(206), "Œ", , , vbBinaryCompare)

' LATIN CAPITAL LIGATURE OE

sBuffer = Replace(sBuffer, ChrW(207), "œ", , , vbBinaryCompare)

' LATIN SMALL LIGATURE OE

sBuffer = Replace(sBuffer, ChrW(208), "–", , , vbBinaryCompare)

' EN DASH

sBuffer = Replace(sBuffer, ChrW(209), "—", , , vbBinaryCompare)

' EM DASH

sBuffer = Replace(sBuffer, ChrW(210), "“", , , vbBinaryCompare)

' LEFT DOUBLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(211), "”", , , vbBinaryCompare)

' RIGHT DOUBLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(212), "‘", , , vbBinaryCompare)

' LEFT SINGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(213), "’", , , vbBinaryCompare)

' RIGHT SINGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(214), "÷", , , vbBinaryCompare)

' DIVISION SIGN

sBuffer = Replace(sBuffer, ChrW(215), "◊", , , vbBinaryCompare)

' LOZENGE

sBuffer = Replace(sBuffer, ChrW(216), "ÿ", , , vbBinaryCompare)

' LATIN SMALL LETTER Y WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(217), "Ÿ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER Y WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(218), "⁄", , , vbBinaryCompare)

' FRACTION SLASH

sBuffer = Replace(sBuffer, ChrW(219), "€", , , vbBinaryCompare)

' EURO SIGN

sBuffer = Replace(sBuffer, ChrW(220), "‹", , , vbBinaryCompare)

' SINGLE LEFT-POINTING ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(221), "›", , , vbBinaryCompare)

' SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(222), "fi", , , vbBinaryCompare)

' LATIN SMALL LIGATURE FI

sBuffer = Replace(sBuffer, ChrW(223), "fl", , , vbBinaryCompare)

' LATIN SMALL LIGATURE FL

sBuffer = Replace(sBuffer, ChrW(224), "‡", , , vbBinaryCompare)

' DOUBLE DAGGER

sBuffer = Replace(sBuffer, ChrW(225), "·", , , vbBinaryCompare)

' MIDDLE DOT

sBuffer = Replace(sBuffer, ChrW(226), "‚", , , vbBinaryCompare)

' SINGLE LOW-9 QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(227), "„", , , vbBinaryCompare)

' DOUBLE LOW-9 QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(228), "‰", , , vbBinaryCompare)

' PER MILLE SIGN

sBuffer = Replace(sBuffer, ChrW(229), "Â", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(230), "Ê", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(231), "Á", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(232), "Ë", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(233), "È", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(234), "Í", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(235), "Î", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(236), "Ï", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(237), "Ì", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(238), "Ó", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(239), "Ô", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(240), "", , , vbBinaryCompare)

' Apple logo

sBuffer = Replace(sBuffer, ChrW(241), "Ò", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(242), "Ú", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(243), "Û", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(244), "Ù", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(245), "ı", , , vbBinaryCompare)

' LATIN SMALL LETTER DOTLESS I

sBuffer = Replace(sBuffer, ChrW(246), "ˆ", , , vbBinaryCompare)

' MODIFIER LETTER CIRCUMFLEX ACCENT

sBuffer = Replace(sBuffer, ChrW(247), "˜", , , vbBinaryCompare)

' SMALL TILDE

sBuffer = Replace(sBuffer, ChrW(248), "¯", , , vbBinaryCompare)

' MACRON

sBuffer = Replace(sBuffer, ChrW(249), "˘", , , vbBinaryCompare)

' BREVE

sBuffer = Replace(sBuffer, ChrW(250), "˙", , , vbBinaryCompare)

' DOT ABOVE

sBuffer = Replace(sBuffer, ChrW(251), "˚", , , vbBinaryCompare)

' RING ABOVE

sBuffer = Replace(sBuffer, ChrW(252), "¸", , , vbBinaryCompare)

' CEDILLA

sBuffer = Replace(sBuffer, ChrW(253), "˝", , , vbBinaryCompare)

' DOUBLE ACUTE ACCENT

sBuffer = Replace(sBuffer, ChrW(254), "˛", , , vbBinaryCompare)

' OGONEK

sBuffer = Replace(sBuffer, ChrW(255), "ˇ", , , vbBinaryCompare)

' CARON





Selection.InsertAfter sBuffer

Selection.Collapse (wdCollapseEnd)



Loop While LngLOF > 0

Close #iFile



StatusBar = "done..."

' Call Hex2Uni5

End Sub







Robix said:
Hi,

I have a text file containing HTML entities (both named and numbered). I
need to convert them to characters they represent before the file goes to
translation and then convert characters back to their HTML entities after
translation. Does anyone know the good (fast) method to do this?

Also I'd like to ask this. It is a LISP file and when I open it in the text
editor it shows it has ANSI Mac encoding. As I want to process it on
Windows, can some problems arise during file processing (I mean characters
will corrupt, line endings will change, etc.)? (I am writing macro for this
in Word 2003). I have no experience with processing Mac files on Windows.
(I need to mention that after translation the file will contain texts in
 
R

Robix

Hi Klaus,

Thank you very very much for your in depth reply and macros. I appreciate it
much. (I must admit I did not expect any reply already :) Of course I'll try
your macros.


Thank you very much again.
Have a nice day.
Robert


(Sorry for the long post)

Hi Robix,

That's quite a bit you're asking ;-)

You could save as HTML to turn characters into numerical character
references, or add <html>...</html> tags at the start/end of the text file
and then open as HTML to turn numerical character references back to text.
But it may be difficult to figure out which encoding to use, or to strip out
the HTML tags that Word inserts.

For toggling a file back and forth between numerical character references
and Unicode characters, I have some macros, Numrefs2Unichars and
Unichars2Numrefs, that I'll post below.

One problem are symbols from symbol fonts. Your files don't contain any, so
you might comment out the call to SymbolsUnprotect.

Another problem is keeping the formatting. That's not a problem in your case
either, and I have some macros that work on unformatted text and are faster
than those posted below.
But since the macros for formatted text are more general and aren't terribly
slow, I'll post those.

As you guessed, if the files come from the Mac, that's another problem.
I usually open (AppleRoman) text files from the Mac with another macro
(ImportMacBinary) that changes the problematic characters directly into
Unicode numerical character references.
You'll need to paste or type the path and file name into the macro, or add
an interface with a file picker.
If you have access to a Mac, you could also open the text file in a Mac
browser and copy/paste from there. Or save as HTML, doc, or RTF on the Mac.
But all these methods are very likely to mess up a couple of characters, and
so I prefer the macro since that gives me full control.

If you use Central European and Turkish characters, you should make sure
that the final recipient on the Mac is using Word2004. Earlier versions may
not be able to handle the necessary Unicode characters, and you may need to
convert to the proper old Mac code page, using special fonts...

I hope the macros work for you,
Klaus



Option Explicit

Sub Numref2Unichar()
'
Dim start
Dim var$, Var2$, VarVal, VarReplace$
Top:
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "&#x[0-9a-fA-F]@;"
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If Selection.Find.Execute Then
var$ = Selection.Text
Var2$ = Left$(var$, Len(var$) - 1)
Var2$ = "&H" & Mid$(Var2$, 4)

VarVal = CLng(Val(Var2$))
VarReplace$ = ChrW(VarVal)
If VarVal = 94 Then
VarReplace$ = VarReplace$ & "^"
End If
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = var$
.Replacement.Text = VarReplace$
.MatchCase = True
.MatchWildcards = False
Select Case Val(Var2$)
Case &HF000 To &HF0FF
.Replacement.Font.Name = "Symbol"
Case Else
End Select
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.TypeText Text:= _
Right$(Var2$, Len(Var2$) - 2) & vbTab & _
VarReplace$
Else
Numref2Unichar2
Exit Sub
End If
GoTo Top
MsgBox Timer - start
End Sub


Private Sub Numref2Unichar2()
'
Dim var$, Var2$, VarVal, VarReplace$
Top:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "&#[0-9]@;"
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If Selection.Find.Execute Then
var$ = Selection.Text
Var2$ = Left$(var$, Len(var$) - 1)
Var2$ = Mid$(Var2$, 3)
VarVal = CLng(Val(Var2$))
VarReplace$ = ChrW(VarVal)
With Selection.Find
.Text = var$
.Replacement.Text = VarReplace$
.MatchCase = True
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.TypeText Text:=Var2$ & vbTab & _
VarReplace$
Else
Exit Sub
End If
GoTo Top
End Sub


Sub Unichar2Numref()
'
Dim myRange As Range
Dim Cancel, c, Flag, Arial
Dim Code As Long
Dim MsgText As String
Dim Tag$, Ausgabe$, txt$
Call SymbolsUnprotect
Cancel = 0
' Word-Unicode in Tags
' First, combining diacritical marks (bug!!)
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "(?)[" & ChrW(&H300) & "-" & _
ChrW(&H361) & "]"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
Selection.Text = Left(Selection.Text, 1) _
& ";&#x0" & Hex(AscW(Right(Selection.Text, 1))) & ";"
' Für Ausgabe am Ende der Datei:
Ausgabe$ = Right(Selection.Text, 5)
Ausgabe$ = Left(Ausgabe$, 4)
If InStr(1, txt, Ausgabe$) > 0 Then
Else
txt$ = txt$ + Ausgabe$ + vbCrLf
End If
Selection.Collapse (wdCollapseEnd)
Wend
'
Set myRange = ActiveDocument.Content
For Each c In myRange.Characters
' c.Select
Arial = 0
Code = Int2Long(AscW(c))
' Welche Zeichen sollen getaggt werden?
Flag = 0
If Code < 0 Then
Flag = 1
End If
If Code > 127 Then
' If Code < 191 Then
Flag = 1
' End If
End If
If Code > 126 Then
Flag = 1
' Decorative fonts?
If Code > Int2Long(&HF000) Then
If Code < Int2Long(&HF100) Then
If c.Font.Name <> "Arial" Then
Cancel = 0
End If
If Cancel <> vbOK Then
c.Select
MsgText = "Doc has characters from old symbol font " & _
c.Font.Name & vbCrLf
MsgText = MsgText & "Tag them in block U+F000 and continue?"
Cancel = MsgBox(MsgText, vbOKCancel + vbQuestion, "Options:")
If Cancel = vbCancel Then
End
Else
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Font.Name = c.Font.Name
Selection.Find.Replacement.Font.Name = "Arial"
With Selection.Find
.Text = ""
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End If
End If
Arial = 1
End If
End If
End If
If (Flag = 1) Then
' Zusammenbasteln des Tags:
Tag$ = Trim(Hex(Code))
While Len(Tag$) < 4
Tag$ = "0" + Tag$
Wend
Ausgabe$ = Tag$
Tag$ = "&#x" + Tag$ + ";"
' Alle Zeichen mit diesem Code in Tags ersetzen:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ChrW(Code)
.Replacement.Text = Tag$
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute _
Replace:=wdReplaceAll
' Für Ausgabe am Ende der Datei:
txt$ = txt$ & Ausgabe$ & vbTab & _
ChrW(Val("&H" & Ausgabe$)) & vbCr
End If
Flag = 0
Next c
ActiveDocument.Content.Select
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.Font.Name = "Arial"
Selection.TypeText ("Unicode characters:")
Selection.TypeParagraph
Selection.InsertAfter txt$
Selection.Sort ExcludeHeader:=False, _
SortFieldType:=wdSortFieldAlphanumeric, _
SortOrder:=wdSortOrderAscending
End Sub

Private Function Int2Long(VarInt As Integer) As Long
Int2Long = VarInt
If Int2Long < 0 Then
Int2Long = Int2Long + 65536
End If
End Function

Private Sub SymbolsUnprotect()
'
Dim SelFont, SelCharNum

Selection.Collapse (wdCollapseStart)
Selection.Find.ClearFormatting
With Selection.Find
.Text = "[" & ChrW(61472) & "-" & _
ChrW(61695) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
With Dialogs(wdDialogInsertSymbol)
SelFont = .Font
SelCharNum = .CharNum
End With

Selection.Font.Name = SelFont
Selection.TypeText Text:=ChrW(SelCharNum)

' replace the last 2 lines with the following
' to protect symbols from decorative fonts:
' Selection.InsertSymbol _
' Font:=SelFont, _
' CharacterNumber:=SelCharNum, _
' Unicode:=True

Wend
End Sub



Sub ImportMacBinary()

Dim sBuffer As String

Dim iFile As Integer

Dim i As Integer

Dim LngLOF As Long

iFile = FreeFile

StatusBar = "Set up buffer..."

sBuffer = String(2000000, "0")

' Documents.Add

Open "S:\Os2\typotext\FD012005.txt" For Binary As #iFile

StatusBar = "read file..."

LngLOF = LOF(iFile)

Do

If LngLOF < 1000000 Then

sBuffer = Input(LngLOF, #iFile)

LngLOF = 0

Else

sBuffer = Input(1000000, #iFile)

LngLOF = LngLOF - 1000000

End If

StatusBar = LngLOF

' Kontrollzeichen

For i = 0 To 8

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

For i = 10 To 12

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

For i = 14 To 31

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

StatusBar = "convert from Mac..."

' Apple Roman nach Unicode

sBuffer = Replace(sBuffer, Chr(128), "Ä", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(129), "Å", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH RING ABOVE

sBuffer = Replace(sBuffer, Chr(130), "Ç", , , vbBinaryCompare)

' LATIN CAPITAL LETTER C WITH CEDILLA

sBuffer = Replace(sBuffer, Chr(131), "É", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH ACUTE

sBuffer = Replace(sBuffer, Chr(132), "Ñ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER N WITH TILDE

sBuffer = Replace(sBuffer, Chr(133), "Ö", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(134), "Ü", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(135), "á", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH ACUTE

sBuffer = Replace(sBuffer, Chr(136), "à", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH GRAVE

sBuffer = Replace(sBuffer, Chr(137), "â", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(138), "ä", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(139), "ã", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH TILDE

sBuffer = Replace(sBuffer, Chr(140), "å", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH RING ABOVE

sBuffer = Replace(sBuffer, Chr(141), "ç", , , vbBinaryCompare)

' LATIN SMALL LETTER C WITH CEDILLA

sBuffer = Replace(sBuffer, Chr(142), "é", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH ACUTE

sBuffer = Replace(sBuffer, Chr(143), "è", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH GRAVE

sBuffer = Replace(sBuffer, Chr(144), "ê", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(145), "ë", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(146), "í", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH ACUTE

sBuffer = Replace(sBuffer, Chr(147), "ì", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH GRAVE

sBuffer = Replace(sBuffer, Chr(148), "î", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(149), "ï", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(150), "ñ", , , vbBinaryCompare)

' LATIN SMALL LETTER N WITH TILDE

sBuffer = Replace(sBuffer, Chr(151), "ó", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH ACUTE

sBuffer = Replace(sBuffer, Chr(152), "ò", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH GRAVE

sBuffer = Replace(sBuffer, Chr(153), "ô", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(154), "ö", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(155), "õ", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH TILDE

sBuffer = Replace(sBuffer, Chr(156), "ú", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH ACUTE

sBuffer = Replace(sBuffer, Chr(157), "ù", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH GRAVE

sBuffer = Replace(sBuffer, Chr(158), "û", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(159), "ü", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(160), "†", , , vbBinaryCompare)

' DAGGER

sBuffer = Replace(sBuffer, ChrW(161), "°", , , vbBinaryCompare)

' DEGREE SIGN

sBuffer = Replace(sBuffer, ChrW(162), "¢", , , vbBinaryCompare)

' CENT SIGN

sBuffer = Replace(sBuffer, ChrW(163), "£", , , vbBinaryCompare)

' POUND SIGN

sBuffer = Replace(sBuffer, ChrW(164), "§", , , vbBinaryCompare)

' SECTION SIGN

sBuffer = Replace(sBuffer, ChrW(165), "•", , , vbBinaryCompare)

' BULLET

sBuffer = Replace(sBuffer, ChrW(166), "¶", , , vbBinaryCompare)

' PILCROW SIGN

sBuffer = Replace(sBuffer, ChrW(167), "ß", , , vbBinaryCompare)

' LATIN SMALL LETTER SHARP S

sBuffer = Replace(sBuffer, ChrW(168), "®", , , vbBinaryCompare)

' REGISTERED SIGN

sBuffer = Replace(sBuffer, ChrW(169), "©", , , vbBinaryCompare)

' COPYRIGHT SIGN

sBuffer = Replace(sBuffer, ChrW(170), "™", , , vbBinaryCompare)

' TRADE MARK SIGN

sBuffer = Replace(sBuffer, ChrW(171), "´", , , vbBinaryCompare)

' ACUTE ACCENT

sBuffer = Replace(sBuffer, ChrW(172), "¨", , , vbBinaryCompare)

' DIAERESIS

sBuffer = Replace(sBuffer, ChrW(173), "≠", , , vbBinaryCompare)

' NOT EQUAL TO

sBuffer = Replace(sBuffer, ChrW(174), "Æ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER AE

sBuffer = Replace(sBuffer, ChrW(175), "Ø", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH STROKE

sBuffer = Replace(sBuffer, ChrW(176), "∞", , , vbBinaryCompare)

' INFINITY

sBuffer = Replace(sBuffer, ChrW(177), "±", , , vbBinaryCompare)

' PLUS-MINUS SIGN

sBuffer = Replace(sBuffer, ChrW(178), "≤", , , vbBinaryCompare)

' LESS-THAN OR EQUAL TO

sBuffer = Replace(sBuffer, ChrW(179), "≥", , , vbBinaryCompare)

' GREATER-THAN OR EQUAL TO

sBuffer = Replace(sBuffer, ChrW(180), "¥", , , vbBinaryCompare)

' YEN SIGN

sBuffer = Replace(sBuffer, ChrW(181), "µ", , , vbBinaryCompare)

' MICRO SIGN

sBuffer = Replace(sBuffer, ChrW(182), "∂", , , vbBinaryCompare)

' PARTIAL DIFFERENTIAL

sBuffer = Replace(sBuffer, ChrW(183), "∑", , , vbBinaryCompare)

' N-ARY SUMMATION

sBuffer = Replace(sBuffer, ChrW(184), "∏", , , vbBinaryCompare)

' N-ARY PRODUCT

sBuffer = Replace(sBuffer, ChrW(185), "π", , , vbBinaryCompare)

' GREEK SMALL LETTER PI

sBuffer = Replace(sBuffer, ChrW(186), "∫", , , vbBinaryCompare)

' INTEGRAL

sBuffer = Replace(sBuffer, ChrW(187), "ª", , , vbBinaryCompare)

' FEMININE ORDINAL INDICATOR

sBuffer = Replace(sBuffer, ChrW(188), "º", , , vbBinaryCompare)

' MASCULINE ORDINAL INDICATOR

sBuffer = Replace(sBuffer, ChrW(189), "Ω", , , vbBinaryCompare)

' GREEK CAPITAL LETTER OMEGA

sBuffer = Replace(sBuffer, ChrW(190), "æ", , , vbBinaryCompare)

' LATIN SMALL LETTER AE

sBuffer = Replace(sBuffer, ChrW(191), "ø", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH STROKE

sBuffer = Replace(sBuffer, ChrW(192), "¿", , , vbBinaryCompare)

' INVERTED QUESTION MARK

sBuffer = Replace(sBuffer, ChrW(193), "¡", , , vbBinaryCompare)

' INVERTED EXCLAMATION MARK

sBuffer = Replace(sBuffer, ChrW(194), "¬", , , vbBinaryCompare)

' NOT SIGN

sBuffer = Replace(sBuffer, ChrW(195), "√", , , vbBinaryCompare)

' SQUARE ROOT

sBuffer = Replace(sBuffer, ChrW(196), "ƒ", , , vbBinaryCompare)

' LATIN SMALL LETTER F WITH HOOK

sBuffer = Replace(sBuffer, ChrW(197), "≈", , , vbBinaryCompare)

' ALMOST EQUAL TO

sBuffer = Replace(sBuffer, ChrW(198), "∆", , , vbBinaryCompare)

' INCREMENT

sBuffer = Replace(sBuffer, ChrW(199), "«", , , vbBinaryCompare)

' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(200), "»", , , vbBinaryCompare)

' RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(201), "…", , , vbBinaryCompare)

' HORIZONTAL ELLIPSIS

sBuffer = Replace(sBuffer, ChrW(202), " ", , , vbBinaryCompare)

' NO-BREAK SPACE

sBuffer = Replace(sBuffer, ChrW(203), "À", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(204), "Ã", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH TILDE

sBuffer = Replace(sBuffer, ChrW(205), "Õ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH TILDE

sBuffer = Replace(sBuffer, ChrW(206), "Œ", , , vbBinaryCompare)

' LATIN CAPITAL LIGATURE OE

sBuffer = Replace(sBuffer, ChrW(207), "œ", , , vbBinaryCompare)

' LATIN SMALL LIGATURE OE

sBuffer = Replace(sBuffer, ChrW(208), "–", , , vbBinaryCompare)

' EN DASH

sBuffer = Replace(sBuffer, ChrW(209), "—", , , vbBinaryCompare)

' EM DASH

sBuffer = Replace(sBuffer, ChrW(210), "“", , , vbBinaryCompare)

' LEFT DOUBLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(211), "”", , , vbBinaryCompare)

' RIGHT DOUBLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(212), "‘", , , vbBinaryCompare)

' LEFT SINGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(213), "’", , , vbBinaryCompare)

' RIGHT SINGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(214), "÷", , , vbBinaryCompare)

' DIVISION SIGN

sBuffer = Replace(sBuffer, ChrW(215), "◊", , , vbBinaryCompare)

' LOZENGE

sBuffer = Replace(sBuffer, ChrW(216), "ÿ", , , vbBinaryCompare)

' LATIN SMALL LETTER Y WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(217), "Ÿ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER Y WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(218), "⁄", , , vbBinaryCompare)

' FRACTION SLASH

sBuffer = Replace(sBuffer, ChrW(219), "€", , , vbBinaryCompare)

' EURO SIGN

sBuffer = Replace(sBuffer, ChrW(220), "‹", , , vbBinaryCompare)

' SINGLE LEFT-POINTING ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(221), "›", , , vbBinaryCompare)

' SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(222), "fi", , , vbBinaryCompare)

' LATIN SMALL LIGATURE FI

sBuffer = Replace(sBuffer, ChrW(223), "fl", , , vbBinaryCompare)

' LATIN SMALL LIGATURE FL

sBuffer = Replace(sBuffer, ChrW(224), "‡", , , vbBinaryCompare)

' DOUBLE DAGGER

sBuffer = Replace(sBuffer, ChrW(225), "·", , , vbBinaryCompare)

' MIDDLE DOT

sBuffer = Replace(sBuffer, ChrW(226), "‚", , , vbBinaryCompare)

' SINGLE LOW-9 QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(227), "„", , , vbBinaryCompare)

' DOUBLE LOW-9 QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(228), "‰", , , vbBinaryCompare)

' PER MILLE SIGN

sBuffer = Replace(sBuffer, ChrW(229), "Â", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(230), "Ê", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(231), "Á", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(232), "Ë", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(233), "È", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(234), "Í", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(235), "Î", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(236), "Ï", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(237), "Ì", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(238), "Ó", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(239), "Ô", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(240), "", , , vbBinaryCompare)

' Apple logo

sBuffer = Replace(sBuffer, ChrW(241), "Ò", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(242), "Ú", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(243), "Û", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(244), "Ù", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(245), "ı", , , vbBinaryCompare)

' LATIN SMALL LETTER DOTLESS I

sBuffer = Replace(sBuffer, ChrW(246), "ˆ", , , vbBinaryCompare)

' MODIFIER LETTER CIRCUMFLEX ACCENT

sBuffer = Replace(sBuffer, ChrW(247), "˜", , , vbBinaryCompare)

' SMALL TILDE

sBuffer = Replace(sBuffer, ChrW(248), "¯", , , vbBinaryCompare)

' MACRON

sBuffer = Replace(sBuffer, ChrW(249), "˘", , , vbBinaryCompare)

' BREVE

sBuffer = Replace(sBuffer, ChrW(250), "˙", , , vbBinaryCompare)

' DOT ABOVE

sBuffer = Replace(sBuffer, ChrW(251), "˚", , , vbBinaryCompare)

' RING ABOVE

sBuffer = Replace(sBuffer, ChrW(252), "¸", , , vbBinaryCompare)

' CEDILLA

sBuffer = Replace(sBuffer, ChrW(253), "˝", , , vbBinaryCompare)

' DOUBLE ACUTE ACCENT

sBuffer = Replace(sBuffer, ChrW(254), "˛", , , vbBinaryCompare)

' OGONEK

sBuffer = Replace(sBuffer, ChrW(255), "ˇ", , , vbBinaryCompare)

' CARON





Selection.InsertAfter sBuffer

Selection.Collapse (wdCollapseEnd)



Loop While LngLOF > 0

Close #iFile



StatusBar = "done..."

' Call Hex2Uni5

End Sub







Robix said:
Hi,

I have a text file containing HTML entities (both named and numbered). I
need to convert them to characters they represent before the file goes to
translation and then convert characters back to their HTML entities after
translation. Does anyone know the good (fast) method to do this?

Also I'd like to ask this. It is a LISP file and when I open it in the
text
editor it shows it has ANSI Mac encoding. As I want to process it on
Windows, can some problems arise during file processing (I mean characters
will corrupt, line endings will change, etc.)? (I am writing macro for
this
in Word 2003). I have no experience with processing Mac files on Windows.
 
K

Klaus Linke

Hi Robert,

I had the macros around anyway, so they didn't make any work. Hope they work for you!

Moving stuff back and forth between Mac and Windows always was good for nasty surprises...
Happened to me lots of times that I took some large file in either direction, worked on it for hours or days, and only then noticed that a couple dozen or hundred characters hadn't converted properly.

Now it's getting much simpler because of Unicode.

Good luck with the macros,
Klaus



Robix said:
Hi Klaus,

Thank you very very much for your in depth reply and macros. I appreciate it
much. (I must admit I did not expect any reply already :) Of course I'll try
your macros.


Thank you very much again.
Have a nice day.
Robert


(Sorry for the long post)

Hi Robix,

That's quite a bit you're asking ;-)

You could save as HTML to turn characters into numerical character
references, or add <html>...</html> tags at the start/end of the text file
and then open as HTML to turn numerical character references back to text.
But it may be difficult to figure out which encoding to use, or to strip out
the HTML tags that Word inserts.

For toggling a file back and forth between numerical character references
and Unicode characters, I have some macros, Numrefs2Unichars and
Unichars2Numrefs, that I'll post below.

One problem are symbols from symbol fonts. Your files don't contain any, so
you might comment out the call to SymbolsUnprotect.

Another problem is keeping the formatting. That's not a problem in your case
either, and I have some macros that work on unformatted text and are faster
than those posted below.
But since the macros for formatted text are more general and aren't terribly
slow, I'll post those.

As you guessed, if the files come from the Mac, that's another problem.
I usually open (AppleRoman) text files from the Mac with another macro
(ImportMacBinary) that changes the problematic characters directly into
Unicode numerical character references.
You'll need to paste or type the path and file name into the macro, or add
an interface with a file picker.
If you have access to a Mac, you could also open the text file in a Mac
browser and copy/paste from there. Or save as HTML, doc, or RTF on the Mac.
But all these methods are very likely to mess up a couple of characters, and
so I prefer the macro since that gives me full control.

If you use Central European and Turkish characters, you should make sure
that the final recipient on the Mac is using Word2004. Earlier versions may
not be able to handle the necessary Unicode characters, and you may need to
convert to the proper old Mac code page, using special fonts...

I hope the macros work for you,
Klaus



Option Explicit

Sub Numref2Unichar()
'
Dim start
Dim var$, Var2$, VarVal, VarReplace$
Top:
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "&#x[0-9a-fA-F]@;"
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If Selection.Find.Execute Then
var$ = Selection.Text
Var2$ = Left$(var$, Len(var$) - 1)
Var2$ = "&H" & Mid$(Var2$, 4)

VarVal = CLng(Val(Var2$))
VarReplace$ = ChrW(VarVal)
If VarVal = 94 Then
VarReplace$ = VarReplace$ & "^"
End If
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = var$
.Replacement.Text = VarReplace$
.MatchCase = True
.MatchWildcards = False
Select Case Val(Var2$)
Case &HF000 To &HF0FF
.Replacement.Font.Name = "Symbol"
Case Else
End Select
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.TypeText Text:= _
Right$(Var2$, Len(Var2$) - 2) & vbTab & _
VarReplace$
Else
Numref2Unichar2
Exit Sub
End If
GoTo Top
MsgBox Timer - start
End Sub


Private Sub Numref2Unichar2()
'
Dim var$, Var2$, VarVal, VarReplace$
Top:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "&#[0-9]@;"
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
If Selection.Find.Execute Then
var$ = Selection.Text
Var2$ = Left$(var$, Len(var$) - 1)
Var2$ = Mid$(Var2$, 3)
VarVal = CLng(Val(Var2$))
VarReplace$ = ChrW(VarVal)
With Selection.Find
.Text = var$
.Replacement.Text = VarReplace$
.MatchCase = True
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.TypeText Text:=Var2$ & vbTab & _
VarReplace$
Else
Exit Sub
End If
GoTo Top
End Sub


Sub Unichar2Numref()
'
Dim myRange As Range
Dim Cancel, c, Flag, Arial
Dim Code As Long
Dim MsgText As String
Dim Tag$, Ausgabe$, txt$
Call SymbolsUnprotect
Cancel = 0
' Word-Unicode in Tags
' First, combining diacritical marks (bug!!)
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "(?)[" & ChrW(&H300) & "-" & _
ChrW(&H361) & "]"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
Selection.Text = Left(Selection.Text, 1) _
& ";&#x0" & Hex(AscW(Right(Selection.Text, 1))) & ";"
' Für Ausgabe am Ende der Datei:
Ausgabe$ = Right(Selection.Text, 5)
Ausgabe$ = Left(Ausgabe$, 4)
If InStr(1, txt, Ausgabe$) > 0 Then
Else
txt$ = txt$ + Ausgabe$ + vbCrLf
End If
Selection.Collapse (wdCollapseEnd)
Wend
'
Set myRange = ActiveDocument.Content
For Each c In myRange.Characters
' c.Select
Arial = 0
Code = Int2Long(AscW(c))
' Welche Zeichen sollen getaggt werden?
Flag = 0
If Code < 0 Then
Flag = 1
End If
If Code > 127 Then
' If Code < 191 Then
Flag = 1
' End If
End If
If Code > 126 Then
Flag = 1
' Decorative fonts?
If Code > Int2Long(&HF000) Then
If Code < Int2Long(&HF100) Then
If c.Font.Name <> "Arial" Then
Cancel = 0
End If
If Cancel <> vbOK Then
c.Select
MsgText = "Doc has characters from old symbol font " & _
c.Font.Name & vbCrLf
MsgText = MsgText & "Tag them in block U+F000 and continue?"
Cancel = MsgBox(MsgText, vbOKCancel + vbQuestion, "Options:")
If Cancel = vbCancel Then
End
Else
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Font.Name = c.Font.Name
Selection.Find.Replacement.Font.Name = "Arial"
With Selection.Find
.Text = ""
.Replacement.Text = "^&"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End If
End If
Arial = 1
End If
End If
End If
If (Flag = 1) Then
' Zusammenbasteln des Tags:
Tag$ = Trim(Hex(Code))
While Len(Tag$) < 4
Tag$ = "0" + Tag$
Wend
Ausgabe$ = Tag$
Tag$ = "&#x" + Tag$ + ";"
' Alle Zeichen mit diesem Code in Tags ersetzen:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ChrW(Code)
.Replacement.Text = Tag$
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = True
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute _
Replace:=wdReplaceAll
' Für Ausgabe am Ende der Datei:
txt$ = txt$ & Ausgabe$ & vbTab & _
ChrW(Val("&H" & Ausgabe$)) & vbCr
End If
Flag = 0
Next c
ActiveDocument.Content.Select
Selection.EndKey Unit:=wdStory
Selection.TypeParagraph
Selection.Font.Name = "Arial"
Selection.TypeText ("Unicode characters:")
Selection.TypeParagraph
Selection.InsertAfter txt$
Selection.Sort ExcludeHeader:=False, _
SortFieldType:=wdSortFieldAlphanumeric, _
SortOrder:=wdSortOrderAscending
End Sub

Private Function Int2Long(VarInt As Integer) As Long
Int2Long = VarInt
If Int2Long < 0 Then
Int2Long = Int2Long + 65536
End If
End Function

Private Sub SymbolsUnprotect()
'
Dim SelFont, SelCharNum

Selection.Collapse (wdCollapseStart)
Selection.Find.ClearFormatting
With Selection.Find
.Text = "[" & ChrW(61472) & "-" & _
ChrW(61695) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
While Selection.Find.Execute
With Dialogs(wdDialogInsertSymbol)
SelFont = .Font
SelCharNum = .CharNum
End With

Selection.Font.Name = SelFont
Selection.TypeText Text:=ChrW(SelCharNum)

' replace the last 2 lines with the following
' to protect symbols from decorative fonts:
' Selection.InsertSymbol _
' Font:=SelFont, _
' CharacterNumber:=SelCharNum, _
' Unicode:=True

Wend
End Sub



Sub ImportMacBinary()

Dim sBuffer As String

Dim iFile As Integer

Dim i As Integer

Dim LngLOF As Long

iFile = FreeFile

StatusBar = "Set up buffer..."

sBuffer = String(2000000, "0")

' Documents.Add

Open "S:\Os2\typotext\FD012005.txt" For Binary As #iFile

StatusBar = "read file..."

LngLOF = LOF(iFile)

Do

If LngLOF < 1000000 Then

sBuffer = Input(LngLOF, #iFile)

LngLOF = 0

Else

sBuffer = Input(1000000, #iFile)

LngLOF = LngLOF - 1000000

End If

StatusBar = LngLOF

' Kontrollzeichen

For i = 0 To 8

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

For i = 10 To 12

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

For i = 14 To 31

sBuffer = Replace(sBuffer, ChrW(i), ChrW(i + &H2400))

Next i

StatusBar = "convert from Mac..."

' Apple Roman nach Unicode

sBuffer = Replace(sBuffer, Chr(128), "Ä", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(129), "Å", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH RING ABOVE

sBuffer = Replace(sBuffer, Chr(130), "Ç", , , vbBinaryCompare)

' LATIN CAPITAL LETTER C WITH CEDILLA

sBuffer = Replace(sBuffer, Chr(131), "É", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH ACUTE

sBuffer = Replace(sBuffer, Chr(132), "Ñ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER N WITH TILDE

sBuffer = Replace(sBuffer, Chr(133), "Ö", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(134), "Ü", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(135), "á", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH ACUTE

sBuffer = Replace(sBuffer, Chr(136), "à", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH GRAVE

sBuffer = Replace(sBuffer, Chr(137), "â", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(138), "ä", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(139), "ã", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH TILDE

sBuffer = Replace(sBuffer, Chr(140), "å", , , vbBinaryCompare)

' LATIN SMALL LETTER A WITH RING ABOVE

sBuffer = Replace(sBuffer, Chr(141), "ç", , , vbBinaryCompare)

' LATIN SMALL LETTER C WITH CEDILLA

sBuffer = Replace(sBuffer, Chr(142), "é", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH ACUTE

sBuffer = Replace(sBuffer, Chr(143), "è", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH GRAVE

sBuffer = Replace(sBuffer, Chr(144), "ê", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(145), "ë", , , vbBinaryCompare)

' LATIN SMALL LETTER E WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(146), "í", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH ACUTE

sBuffer = Replace(sBuffer, Chr(147), "ì", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH GRAVE

sBuffer = Replace(sBuffer, Chr(148), "î", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(149), "ï", , , vbBinaryCompare)

' LATIN SMALL LETTER I WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(150), "ñ", , , vbBinaryCompare)

' LATIN SMALL LETTER N WITH TILDE

sBuffer = Replace(sBuffer, Chr(151), "ó", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH ACUTE

sBuffer = Replace(sBuffer, Chr(152), "ò", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH GRAVE

sBuffer = Replace(sBuffer, Chr(153), "ô", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(154), "ö", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH DIAERESIS

sBuffer = Replace(sBuffer, Chr(155), "õ", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH TILDE

sBuffer = Replace(sBuffer, Chr(156), "ú", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH ACUTE

sBuffer = Replace(sBuffer, Chr(157), "ù", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH GRAVE

sBuffer = Replace(sBuffer, Chr(158), "û", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, Chr(159), "ü", , , vbBinaryCompare)

' LATIN SMALL LETTER U WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(160), "†", , , vbBinaryCompare)

' DAGGER

sBuffer = Replace(sBuffer, ChrW(161), "°", , , vbBinaryCompare)

' DEGREE SIGN

sBuffer = Replace(sBuffer, ChrW(162), "¢", , , vbBinaryCompare)

' CENT SIGN

sBuffer = Replace(sBuffer, ChrW(163), "£", , , vbBinaryCompare)

' POUND SIGN

sBuffer = Replace(sBuffer, ChrW(164), "§", , , vbBinaryCompare)

' SECTION SIGN

sBuffer = Replace(sBuffer, ChrW(165), "•", , , vbBinaryCompare)

' BULLET

sBuffer = Replace(sBuffer, ChrW(166), "¶", , , vbBinaryCompare)

' PILCROW SIGN

sBuffer = Replace(sBuffer, ChrW(167), "ß", , , vbBinaryCompare)

' LATIN SMALL LETTER SHARP S

sBuffer = Replace(sBuffer, ChrW(168), "®", , , vbBinaryCompare)

' REGISTERED SIGN

sBuffer = Replace(sBuffer, ChrW(169), "©", , , vbBinaryCompare)

' COPYRIGHT SIGN

sBuffer = Replace(sBuffer, ChrW(170), "™", , , vbBinaryCompare)

' TRADE MARK SIGN

sBuffer = Replace(sBuffer, ChrW(171), "´", , , vbBinaryCompare)

' ACUTE ACCENT

sBuffer = Replace(sBuffer, ChrW(172), "¨", , , vbBinaryCompare)

' DIAERESIS

sBuffer = Replace(sBuffer, ChrW(173), "≠", , , vbBinaryCompare)

' NOT EQUAL TO

sBuffer = Replace(sBuffer, ChrW(174), "Æ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER AE

sBuffer = Replace(sBuffer, ChrW(175), "Ø", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH STROKE

sBuffer = Replace(sBuffer, ChrW(176), "∞", , , vbBinaryCompare)

' INFINITY

sBuffer = Replace(sBuffer, ChrW(177), "±", , , vbBinaryCompare)

' PLUS-MINUS SIGN

sBuffer = Replace(sBuffer, ChrW(178), "≤", , , vbBinaryCompare)

' LESS-THAN OR EQUAL TO

sBuffer = Replace(sBuffer, ChrW(179), "≥", , , vbBinaryCompare)

' GREATER-THAN OR EQUAL TO

sBuffer = Replace(sBuffer, ChrW(180), "¥", , , vbBinaryCompare)

' YEN SIGN

sBuffer = Replace(sBuffer, ChrW(181), "µ", , , vbBinaryCompare)

' MICRO SIGN

sBuffer = Replace(sBuffer, ChrW(182), "∂", , , vbBinaryCompare)

' PARTIAL DIFFERENTIAL

sBuffer = Replace(sBuffer, ChrW(183), "∑", , , vbBinaryCompare)

' N-ARY SUMMATION

sBuffer = Replace(sBuffer, ChrW(184), "∏", , , vbBinaryCompare)

' N-ARY PRODUCT

sBuffer = Replace(sBuffer, ChrW(185), "π", , , vbBinaryCompare)

' GREEK SMALL LETTER PI

sBuffer = Replace(sBuffer, ChrW(186), "∫", , , vbBinaryCompare)

' INTEGRAL

sBuffer = Replace(sBuffer, ChrW(187), "ª", , , vbBinaryCompare)

' FEMININE ORDINAL INDICATOR

sBuffer = Replace(sBuffer, ChrW(188), "º", , , vbBinaryCompare)

' MASCULINE ORDINAL INDICATOR

sBuffer = Replace(sBuffer, ChrW(189), "Ω", , , vbBinaryCompare)

' GREEK CAPITAL LETTER OMEGA

sBuffer = Replace(sBuffer, ChrW(190), "æ", , , vbBinaryCompare)

' LATIN SMALL LETTER AE

sBuffer = Replace(sBuffer, ChrW(191), "ø", , , vbBinaryCompare)

' LATIN SMALL LETTER O WITH STROKE

sBuffer = Replace(sBuffer, ChrW(192), "¿", , , vbBinaryCompare)

' INVERTED QUESTION MARK

sBuffer = Replace(sBuffer, ChrW(193), "¡", , , vbBinaryCompare)

' INVERTED EXCLAMATION MARK

sBuffer = Replace(sBuffer, ChrW(194), "¬", , , vbBinaryCompare)

' NOT SIGN

sBuffer = Replace(sBuffer, ChrW(195), "√", , , vbBinaryCompare)

' SQUARE ROOT

sBuffer = Replace(sBuffer, ChrW(196), "ƒ", , , vbBinaryCompare)

' LATIN SMALL LETTER F WITH HOOK

sBuffer = Replace(sBuffer, ChrW(197), "≈", , , vbBinaryCompare)

' ALMOST EQUAL TO

sBuffer = Replace(sBuffer, ChrW(198), "∆", , , vbBinaryCompare)

' INCREMENT

sBuffer = Replace(sBuffer, ChrW(199), "«", , , vbBinaryCompare)

' LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(200), "»", , , vbBinaryCompare)

' RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(201), "…", , , vbBinaryCompare)

' HORIZONTAL ELLIPSIS

sBuffer = Replace(sBuffer, ChrW(202), " ", , , vbBinaryCompare)

' NO-BREAK SPACE

sBuffer = Replace(sBuffer, ChrW(203), "À", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(204), "Ã", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH TILDE

sBuffer = Replace(sBuffer, ChrW(205), "Õ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH TILDE

sBuffer = Replace(sBuffer, ChrW(206), "Œ", , , vbBinaryCompare)

' LATIN CAPITAL LIGATURE OE

sBuffer = Replace(sBuffer, ChrW(207), "œ", , , vbBinaryCompare)

' LATIN SMALL LIGATURE OE

sBuffer = Replace(sBuffer, ChrW(208), "–", , , vbBinaryCompare)

' EN DASH

sBuffer = Replace(sBuffer, ChrW(209), "—", , , vbBinaryCompare)

' EM DASH

sBuffer = Replace(sBuffer, ChrW(210), "“", , , vbBinaryCompare)

' LEFT DOUBLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(211), "”", , , vbBinaryCompare)

' RIGHT DOUBLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(212), "‘", , , vbBinaryCompare)

' LEFT SINGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(213), "’", , , vbBinaryCompare)

' RIGHT SINGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(214), "÷", , , vbBinaryCompare)

' DIVISION SIGN

sBuffer = Replace(sBuffer, ChrW(215), "◊", , , vbBinaryCompare)

' LOZENGE

sBuffer = Replace(sBuffer, ChrW(216), "ÿ", , , vbBinaryCompare)

' LATIN SMALL LETTER Y WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(217), "Ÿ", , , vbBinaryCompare)

' LATIN CAPITAL LETTER Y WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(218), "⁄", , , vbBinaryCompare)

' FRACTION SLASH

sBuffer = Replace(sBuffer, ChrW(219), "€", , , vbBinaryCompare)

' EURO SIGN

sBuffer = Replace(sBuffer, ChrW(220), "‹", , , vbBinaryCompare)

' SINGLE LEFT-POINTING ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(221), "›", , , vbBinaryCompare)

' SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(222), "fi", , , vbBinaryCompare)

' LATIN SMALL LIGATURE FI

sBuffer = Replace(sBuffer, ChrW(223), "fl", , , vbBinaryCompare)

' LATIN SMALL LIGATURE FL

sBuffer = Replace(sBuffer, ChrW(224), "‡", , , vbBinaryCompare)

' DOUBLE DAGGER

sBuffer = Replace(sBuffer, ChrW(225), "·", , , vbBinaryCompare)

' MIDDLE DOT

sBuffer = Replace(sBuffer, ChrW(226), "‚", , , vbBinaryCompare)

' SINGLE LOW-9 QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(227), "„", , , vbBinaryCompare)

' DOUBLE LOW-9 QUOTATION MARK

sBuffer = Replace(sBuffer, ChrW(228), "‰", , , vbBinaryCompare)

' PER MILLE SIGN

sBuffer = Replace(sBuffer, ChrW(229), "Â", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(230), "Ê", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(231), "Á", , , vbBinaryCompare)

' LATIN CAPITAL LETTER A WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(232), "Ë", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(233), "È", , , vbBinaryCompare)

' LATIN CAPITAL LETTER E WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(234), "Í", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(235), "Î", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(236), "Ï", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH DIAERESIS

sBuffer = Replace(sBuffer, ChrW(237), "Ì", , , vbBinaryCompare)

' LATIN CAPITAL LETTER I WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(238), "Ó", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(239), "Ô", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(240), "", , , vbBinaryCompare)

' Apple logo

sBuffer = Replace(sBuffer, ChrW(241), "Ò", , , vbBinaryCompare)

' LATIN CAPITAL LETTER O WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(242), "Ú", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH ACUTE

sBuffer = Replace(sBuffer, ChrW(243), "Û", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH CIRCUMFLEX

sBuffer = Replace(sBuffer, ChrW(244), "Ù", , , vbBinaryCompare)

' LATIN CAPITAL LETTER U WITH GRAVE

sBuffer = Replace(sBuffer, ChrW(245), "ı", , , vbBinaryCompare)

' LATIN SMALL LETTER DOTLESS I

sBuffer = Replace(sBuffer, ChrW(246), "ˆ", , , vbBinaryCompare)

' MODIFIER LETTER CIRCUMFLEX ACCENT

sBuffer = Replace(sBuffer, ChrW(247), "˜", , , vbBinaryCompare)

' SMALL TILDE

sBuffer = Replace(sBuffer, ChrW(248), "¯", , , vbBinaryCompare)

' MACRON

sBuffer = Replace(sBuffer, ChrW(249), "˘", , , vbBinaryCompare)

' BREVE

sBuffer = Replace(sBuffer, ChrW(250), "˙", , , vbBinaryCompare)

' DOT ABOVE

sBuffer = Replace(sBuffer, ChrW(251), "˚", , , vbBinaryCompare)

' RING ABOVE

sBuffer = Replace(sBuffer, ChrW(252), "¸", , , vbBinaryCompare)

' CEDILLA

sBuffer = Replace(sBuffer, ChrW(253), "˝", , , vbBinaryCompare)

' DOUBLE ACUTE ACCENT

sBuffer = Replace(sBuffer, ChrW(254), "˛", , , vbBinaryCompare)

' OGONEK

sBuffer = Replace(sBuffer, ChrW(255), "ˇ", , , vbBinaryCompare)

' CARON





Selection.InsertAfter sBuffer

Selection.Collapse (wdCollapseEnd)



Loop While LngLOF > 0

Close #iFile



StatusBar = "done..."

' Call Hex2Uni5

End Sub







Robix said:
Hi,

I have a text file containing HTML entities (both named and numbered). I
need to convert them to characters they represent before the file goes to
translation and then convert characters back to their HTML entities after
translation. Does anyone know the good (fast) method to do this?

Also I'd like to ask this. It is a LISP file and when I open it in the
text
editor it shows it has ANSI Mac encoding. As I want to process it on
Windows, can some problems arise during file processing (I mean characters
will corrupt, line endings will change, etc.)? (I am writing macro for
this
in Word 2003). I have no experience with processing Mac files on Windows.
(I need to mention that after translation the file will contain texts in
several languages - Western European, Central European and Turkish).

Thank you very much in advance,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top