Find/Replace stray double-byte (Japanese) chars (English, Word2000

MLGrant · Nov 17, 2004

Greetings all,

I am trying to build a macro that can efficiently search through a document
and highlight (or underline) any characters that are double-byte/fullwidth
Japanese. Currently, I am resorting to checking each character of the
document, one by one, and inserting an '@' character before each character
(or string of adjacent characters if there's more than one).

I have no experience building macros, and have only been able to come up
with the following (excerpt):

Code:

WordCount = ActiveDocument.Content.Words.Count
i = 1
While i < WordCount + 1
Application.StatusBar = i & "/" & WordCount & " ãƒ¯ãƒ¼ãƒ‰ã‚’å‡¦ç†ã—ã¦ã„ã¾ã™..."
LenOrgWord = LenB(ActiveDocument.Content.Words(i))
LenCvtdWord = LenB(StrConv(ActiveDocument.Content.Words(i),
vbFromUnicode))

If LenOrgWord = LenCvtdWord Then
With ActiveDocument.Words(i)
.InsertBefore ("ï¼ ")
.Font.Color = wdColorRed
End With
i = i + 1
WordCount = WordCount + 1
End If
i = i + 1
Wend

Can anyone recommend a better, more efficient way of going about this?

I have also tried using the built-in Find function and tried searching for
everything EXCEPT A-Z, etc. (i.e. using the regex:
[!A-Za-z0-9^0013-^0255^t^m^x^z^n\@ ]* ). However I haven't found a regex
that works 100%.

Any ideas/suggestions?

Thanks.

Helmut Weber · Nov 17, 2004

Hi,
there have been lots of discussions about what method
is faster than another one, if you mean that,
whether it is "for each" or "for i = 1 to xyz.count".
It is with sorting in a way, if you have sufficient
information about the data you have to process beforehand,
then it's kind of easy to choose the best method.
Otherwise, it is a matter of luck, though there must be a method
returning the best results, statistically. In your case, if
double-byte/fullwidth characters are rather rare, you may start
with greater units than words, and check, whether your criteria
apply to these units. Like checking, theoretically, documents,
sections, paragraphs, words, and act, processing the smaller
unit, if the analyses of the larger unit tells you, that there
is the something, you are searching for. If you know, that this
something is in almost everywhere, than there is no profit in
checking documents, sections, paragraphs...
I'd try
Dim oWrd as object
For each oWrd in activedocument.words
if oWrd ....
Next
---
Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word XP, Win 98
http://word.mvps.org/

Klaus Linke · Nov 17, 2004

Hi ML,

You could also use a wildcard replacement for the corresponding codes (code
pages), say for Hiragana:

.Text = "[" & ChrW(&H3040) & "-" & ChrW(&H309E) & "]{1;}"
.MatchWildcards = True
' ...

You can replace by itself ("^&") plus some formatting like highlight,
underline, or font color.

Regards,
Klaus

Klaus Linke · Nov 18, 2004

Argh, the old problem of the Windows field separator again!
In thge English version, you have to use a comma instead of the semicolon:
.Text = "[" & ChrW(&H3040) & "-" & ChrW(&H309E) & "]{1,}"

Klaus

MLGrant · Nov 22, 2004

Klaus,

Many, many thanks! Searching based on the code page works perfectly!

I'll do some searching for other code pages like Katakana, etc., however,
could you
(or anyone else for that matter) point me to the other code page(s)?

Again, thank you!

MLGrant

MLGrant · Nov 22, 2004

Never mind, I found what I needed!

Thank you again, one and all.

MLGrant

MLGrant said:
Klaus,

Many, many thanks! Searching based on the code page works perfectly!

I'll do some searching for other code pages like Katakana, etc., however,
could you
(or anyone else for that matter) point me to the other code page(s)?

Again, thank you!

MLGrant

Klaus Linke said:

Argh, the old problem of the Windows field separator again!
In thge English version, you have to use a comma instead of the semicolon:
.Text = "[" & ChrW(&H3040) & "-" & ChrW(&H309E) & "]{1,}"

Klaus

Click to expand...

Identifying full-width/zenkaku (Japanese) characters within a document...	1	Sep 27, 2004
Reconvert Japanese, Traditional Chinese, or Simplified Chinese Tex	1	Dec 5, 2007
Double byte (Japanese) character question	0	Nov 28, 2007
How to detect double byte character fields ?	2	Apr 21, 2006

Find/Replace stray double-byte (Japanese) chars (English, Word2000

MLGrant

Helmut Weber

Klaus Linke

Klaus Linke

MLGrant

MLGrant

Ask a Question

Similar Threads