Finding Double-Byte Characters

M

Mike Faulkner

Hello
I would like to view some code that finds Double-Byte Characters. I want to
remove Double-Byte characters before performing a DeltaView comparison.

The code relating to 'Double-Byte' takes a long time to search a large
document.

Any help would be much appreciated.

Regards
Mike
 
J

Jonathan West

Mike Faulkner said:
Hello
I would like to view some code that finds Double-Byte Characters. I want
to
remove Double-Byte characters before performing a DeltaView comparison.

The code relating to 'Double-Byte' takes a long time to search a large
document.

Any help would be much appreciated.

Regards
Mike

We can't help much unless you show us the code you are using at the moment


--
Regards
Jonathan West - Word MVP
www.intelligentdocuments.co.uk
Please reply to the newsgroup
Keep your VBA code safe, sign the ClassicVB petition www.classicvb.org
 
M

Mike Faulkner

Jonathan

Thanks for your interest. Do you have any code to search for Double-Byte
characters? If not please do not reply.

Regards
Mike
 
T

Tony Jollans

What do you mean when you say "double byte characters"?

Do you mean old-style DBCS strings bounded by SO/SI characters?
Do you mean any unicode characters stored as UCS-2?
Do you mean unicode surrogate pairs for code points above plane 0?
Do you mean any character with an ANSI code higher than 127? Or 255?
Or what?

What code exactly are you referring to when you say it takes a long time?
Perhaps if you posted it we could see (a) what you were trying to do and (b)
how it might be possible to improve its performance.
 
M

Mike Faulkner

Tony

Many thanks for replying. The only code I have found is actually in this
Forum. To find it please search on 'Double-Byte'. I have used it. It loops
through every character in a document.

I know that there is a faster way of finding DBCS. I am evaluating a product
called DocXtools by Microsystems. It displays a toolbar and jumps from on
DBCS to another very quickly. Thier code is embedded in DLL's.

What type of DBCS, well I'm not really sure. DeltaView (Document comparison
app.) sometimes hangs when it encounters one.

Many thanks again

Regards
Mike
 
D

Doug Robbins - Word MVP

Help people help you by including the code in your post.

It is quite a reasonable request and your telling people like Jonathon West
not reply if he doesn't happen to have such code at his fingertips is no way
to get help.

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
 
T

Tony Jollans

It seems I have to register in some way to get hold of docXtools so I'm not
going to be finding out much about that.

I did have a quick play with DeltaView but it seems to cope just fine with
all unicode characters - normal double byte ones and surrogate pairs.

The only example code I could find by searching this newsgroup looked
explicitly for Hiragani (or Katakana, I forget which now) characters which
happen to be 'double byte' (their presence in a document, however, makes the
whole document double byte) but are hardly the sum total of double byte
characters so I'm afraid I'm none the wiser about what it is you really
want.

It would be better if you could post the code - without it and/or a fuller
description (perhaps docxtools documentation says what it does?) of what you
want I can't really help you any more. If it's a simple find and replace as
per the code I found I doubt you will find anything faster.and I am
surprised you find it particularly slow but I don't know what code in dlls
might be doing that possibly goes way beyond what can be done in VBA.
 
M

Mike Faulkner

Tony

Many thanks for your detailed reply. However, I am assured by the
Microsystems (DocX) people that DBCS will occasionally cause DeltaView (DV)
version 2.x to hang.

I run a VBA tool on approx. 5,000 documents. It extracts various items of
information on each document, Revisions, Char styles, Broken styles, DV
Bookmarks & Styles etc. Adding DBCS to the list would have helped to narrow
down the problem areas users encounter when performing DeltaView comparisons.

I'll speak to the DocX Developers and try and ask a bit more about thier
DBCS search tool.

Once again many thanks for your time.

Regards
Mike
 
J

Jean-Guy Marcil

Mike Faulkner was telling us:
Mike Faulkner nous racontait que :
Tony

Many thanks for replying. The only code I have found is actually in
this Forum. To find it please search on 'Double-Byte'. I have used
it. It loops through every character in a document.

Let me see if I can get this straight..

You need help, two very knowledgeable people offer to help.
You were rude with one, and told the other to search the group to find
examples of what you mean?

Jeezz... the nerves...

You were very lucky that Tony decided to ignore all this and ploughed on
with offering help.

I understand that you might be busy... but... next time, I might suggest
that you were a but more considerate to those who give up part of their free
time to help others...

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
K

Klaus Linke

My sentiments exactly :)

A wildcard search would be fast, too. Say "Match wildcards",
Find what: [!^001-^0255]

I can't imagine that any Word add-in in a halfway recent version (post-97)
has general problems with Unicode, though.
You might try to find out which specific characters, if any, DeltaView has
problems with.

Regards
Klaus
 
M

Mike Faulkner

Klaus

Many thanks for your advice. Workshare (DeltaView) are reluctant to reveal
what stops it's product and 'hangs' Word.

MicroSystems (DocXtools) insist that DBCS cause Comparison problems. Their
Toolbar is very fast on a 150 page document. However, it's possible that
their Discovery (DocXtools) Bookmarks the DBCS's and the toolbar simply jumps
from one bookmark to the next. This would explain why it takes 5 minutes to
'Discover' a 150 page document. It's looping through every character and
testing whether it's a DBCS.

Regards
Mike

Klaus Linke said:
My sentiments exactly :)

A wildcard search would be fast, too. Say "Match wildcards",
Find what: [!^001-^0255]

I can't imagine that any Word add-in in a halfway recent version (post-97)
has general problems with Unicode, though.
You might try to find out which specific characters, if any, DeltaView has
problems with.

Regards
Klaus



Jean-Guy Marcil said:
Mike Faulkner was telling us:
Mike Faulkner nous racontait que :


Let me see if I can get this straight..

You need help, two very knowledgeable people offer to help.
You were rude with one, and told the other to search the group to find
examples of what you mean?

Jeezz... the nerves...

You were very lucky that Tony decided to ignore all this and ploughed on
with offering help.

I understand that you might be busy... but... next time, I might suggest
that you were a but more considerate to those who give up part of their
free time to help others...

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
T

Tony Jollans

What I note about this is the consistent use of the terms "Double Byte" and
"DBCS", rather than Unicode, and I wonder whether this is the real issue. I
can well imagine that modern products could get hung up with 'old' DBCS
data.

--
Enjoy,
Tony


Mike Faulkner said:
Klaus

Many thanks for your advice. Workshare (DeltaView) are reluctant to reveal
what stops it's product and 'hangs' Word.

MicroSystems (DocXtools) insist that DBCS cause Comparison problems. Their
Toolbar is very fast on a 150 page document. However, it's possible that
their Discovery (DocXtools) Bookmarks the DBCS's and the toolbar simply jumps
from one bookmark to the next. This would explain why it takes 5 minutes to
'Discover' a 150 page document. It's looping through every character and
testing whether it's a DBCS.

Regards
Mike

Klaus Linke said:
My sentiments exactly :)

A wildcard search would be fast, too. Say "Match wildcards",
Find what: [!^001-^0255]

I can't imagine that any Word add-in in a halfway recent version (post-97)
has general problems with Unicode, though.
You might try to find out which specific characters, if any, DeltaView has
problems with.

Regards
Klaus



Jean-Guy Marcil said:
Mike Faulkner was telling us:
Mike Faulkner nous racontait que :

Tony

Many thanks for replying. The only code I have found is actually in
this Forum. To find it please search on 'Double-Byte'. I have used
it. It loops through every character in a document.

Let me see if I can get this straight..

You need help, two very knowledgeable people offer to help.
You were rude with one, and told the other to search the group to find
examples of what you mean?

Jeezz... the nerves...

You were very lucky that Tony decided to ignore all this and ploughed on
with offering help.

I understand that you might be busy... but... next time, I might suggest
that you were a but more considerate to those who give up part of their
free time to help others...

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
M

Mike Faulkner

Tony

Many thanks for replying.

Now I understand why Microsystems(DocXtools) insist that DBCS can interfere
with Comparison tools. Approx. 80% of our documents are copies (Dupe &
Revise) of old documents. This practise ensures that old problems/issues,
like DBCS, are continually carried forward.

DocXtools has identified 5 DBCS in a document this morning. They are 'blobs'
(large fullstop) and are visible. The Word Symbol chart identifies them as -
SPACE, Character Code: 32, from: ASCII(decimal).

Regards
Mike

Tony Jollans said:
What I note about this is the consistent use of the terms "Double Byte" and
"DBCS", rather than Unicode, and I wonder whether this is the real issue. I
can well imagine that modern products could get hung up with 'old' DBCS
data.

--
Enjoy,
Tony


Mike Faulkner said:
Klaus

Many thanks for your advice. Workshare (DeltaView) are reluctant to reveal
what stops it's product and 'hangs' Word.

MicroSystems (DocXtools) insist that DBCS cause Comparison problems. Their
Toolbar is very fast on a 150 page document. However, it's possible that
their Discovery (DocXtools) Bookmarks the DBCS's and the toolbar simply jumps
from one bookmark to the next. This would explain why it takes 5 minutes to
'Discover' a 150 page document. It's looping through every character and
testing whether it's a DBCS.

Regards
Mike

Klaus Linke said:
My sentiments exactly :)

A wildcard search would be fast, too. Say "Match wildcards",
Find what: [!^001-^0255]

I can't imagine that any Word add-in in a halfway recent version (post-97)
has general problems with Unicode, though.
You might try to find out which specific characters, if any, DeltaView has
problems with.

Regards
Klaus



Mike Faulkner was telling us:
Mike Faulkner nous racontait que :

Tony

Many thanks for replying. The only code I have found is actually in
this Forum. To find it please search on 'Double-Byte'. I have used
it. It loops through every character in a document.

Let me see if I can get this straight..

You need help, two very knowledgeable people offer to help.
You were rude with one, and told the other to search the group to find
examples of what you mean?

Jeezz... the nerves...

You were very lucky that Tony decided to ignore all this and ploughed on
with offering help.

I understand that you might be busy... but... next time, I might suggest
that you were a but more considerate to those who give up part of their
free time to help others...

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top