Searching for character in a Devanagari Unicode glyph

R

Raghav Das

I hope that the Devanagari characters in the following message ar
displayed properly.

For example, if I want to search (using Find and Replace options o
Microsoft Word) for the character म within the glyp
मि. Then how do I do it?

I mean to say,

IF there is a Devanagari glyph (which is displayed as single character
मि, WHICH is composed of the following two Unicod
Characters:

Total 2 characters "मि"

In HEXADECIMAL

92Eh (म) 93Fh (ि)

In DECIMAL

2350 (म) 2367 (ि)

AND I search for the character 92Eh (म) by typing it in the "Find
box of Microsoft Word, then, although this character is present in th
Word file, MS Word doesn't find it-----as if it is not there.

However, when the character 92Eh (म) is NOT followed by the sig
of the VOWEL [i.e. when it is NOT followed by 93Fh (ि) ], the
Microsoft Word easily finds it.

But, as mentioned above, it doesn't find it when it is followed by
VOWEL-sign such as 93Fh (ि).

QUESTION:

How to find the character like 92Eh (म) EVEN WHEN it is followe
by a vowel sign such as 93Fh (ि) using FIND (SEARCH) features o
Microsoft Word?

It is very important for me to know the answer for this question becaus
I want to write a Word macro to convert the DEVANAGARI UNICODE text t
the DEVANAGARI TEXT in the ISCII encoding [using my method].

Thanking you, in advance
 
P

Peter T. Daniels

Normally I have no problem seeing non-roman characters in my email,
but here I see what are probably Unicode code point numbers.

Before you start this quixotic enterprise, are you certain that every
Unicode character has an equivalent in ISCII (which I know nothing
about)? Does ISCII have a separate character for every possible
conjunct akshara with every possible matra, the way Unicode Korean has
a separate character for every possible syllable block?

That seems unlikely ...

Does ISCII automatically form conjuncts, or do you have to input the
reduced form alongside the full form of the base character?

In Word, you should be able to search any sequence of consonants and
vowels (whether or niot they are combined), and you might even make
some shortcuts using wildcards, but it isn't entirely clear what
you're trying to do.
 
R

Raghav Das

Before you start this quixotic enterprise, are you certain that every
Unicode character has an equivalent in ISCII (which I know nothing
about)? Does ISCII have a separate character for every possible
conjunct akshara with every possible matra, the way Unicode Korea has
a separate character for every possible syllable block?
That seems unlikely ...

Yes, every Unicode (Devanagari) character has an equivalent in ISCII.

This is because, ISCII means "Indian Standard Code for Informatio
Interchange".
[Ref--Indian Standard Document 13194, Bureau of Indian Standards
1991.]

The Unicode (Devanagari part) is based on ISCII.

Yes, ISCII has a separate character for every possible conjunct akshar
with every possible matra. Everything in Unicode (Devanagari part) i
there in ISCII.
Does ISCII automatically form conjuncts, or do you have to input the
reduced form alongside the full form of the base character?

No. ISCII doesn't automatically form conjuncts.

ISCII text isn't a readable Devanagari text. The ISCII is simply
format. ISCII only contains the basic "consonants", "vowels", "matras o
vowels" and "other marks" just as Unicode (Devanagari) does. Showing th
visible conjuncts on the screen is the job of the programmer. ISCII i
exactly similar to Unicode (Devanagari part of Unicode). Just as th
Unicode Devanagari text in Microsoft Word that is stored in Word file
stores only the "consonants", "vowels" and "matras of vowels" etc, an
the conjuncts that are visibly displayed on screen are rendered by
component of the operating system (Windows) called "Uniscribe", th
ISCII also only contains "consonants", "vowels" and "matras of vowel
etc. and NOT the conjunct forms.

There is practically one-to-one relationship between the every characte
in ISCII and the every character in Unicode (Devanagari) with som
exceptions, but these exceptions can be solved easily.

Since India still very much depends on the old non-Unicode format tex
for publishing of books in Devanagari (Desk-Top-Publishing). [becaus
the publishing softwares like QuarkXPress, Adobe Indesign etc. do no
support Unicode Devanagari text or have only newly introduced it] India
people haven't got rid of the OLD non-Unicode format text yet. However
since Unicode Devanagari text is becoming popular on Internet, web site
(such as google) and emails, people frequently need to conver
NON-UNICODE text to Unicode and vice versa.

There are many third party softwares in India (such as ISM, Shree-Lip
and Indica), which provide converters from ISCII to their forma
(non-Unicode) and vice versa, and I have purchased many such third part
softwares. Hence I can readily convert ISCII text to any of the popula
non-Unicode format text of India and vice versa, using these software
of India, which I have purchased. Because of availability of thes
third-party softwares, for me, doing Unicode-to-non-Unicode conversio
(and vice versa) is equivalent to doing ISCII-to-Unicode conversion (an
vice versa).

Of course, these softwares also provide ISCII-to-Unicode and vice vers
conversion (using non-VBA programming), but I want to do my ow
ISCII-to-Unicode and vice versa conversion, using my own Microsoft Wor
macros, because their conversions aren't perfect and secondly, I hav
successfully replaced many of their conversion tools with my own VB
Word macros, which I want to do here also.

I have already successfully created an ISCII-to-Unicode (Devanagari
macro in Microsoft Word. But, now, only the reverse directio
macro---Unicode-to-ISCII macro----has to be created, and that is givin
me problems, as described in my previous message.

I have written the forward direction macro (ISCII-to-Unicode macro
successfully using "Find and Replace" commands. The macro simply issue
the following block of statements repeatedly with different values i
each occurrence.

For example,


'ISCII-to-Unicode macro [forward macro]
'
'WORKS SUCCESSFULLY
'
'
'e.g.
'
'(1)
Selection.Find.Text = "^0204" 'ISCII code of Devanagar
consonant 'ma'
Selection.Find.Replacement.Text = "^u2350"
'Unicode Devanagari consonant 'ma' [92E hex or
2350 decimal]

Selection.Find.Execute Replace:=wdReplaceAll


'(2)
Selection.Find.Text = "^0219" 'ISCII code of Devanagari matra
of vowel "hrasva i"
Selection.Find.Replacement.Text = "^u2367"
'Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]

Selection.Find.Execute Replace:=wdReplaceAll


'(3) etc.



As mentioned above, this forward macro works perfectly and
successfully.


But the reverse macro gives problems.


'Unicode-to-ISCII macro [reverse macro]
'
'GIVES PROBLEMS
'BECAUSE THE CHARACTER Unicode Devanagari 'ma' [92E hex or 2350
decimal]
'ISN'T FOUND BY MICROSOFT WORD WHEN IT IS IMMEDIATELY FOLLOWED (IN THE
FILE)
'BY A VOWEL-MATRA, SUCH AS Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]
'
'IT IS NOT FOUND BY THE FOLLOWING COMMAND, WHEN IT IS FOLLOWED BY ANY
VOWEL-MATRA.
'
'e.g.
'
'(1)
Selection.Find.Text = "^u2350" 'Unicode Devanagari consonant 'ma'
[92E hex or 2350 decimal]
Selection.Find.Replacement.Text = "^0204" 'ISCII code of Devanagari
consonant 'ma'
Selection.Find.Execute Replace:=wdReplaceAll


'(2)
Selection.Find.Text = "^u2367" 'Unicode Devanagari 'matra of
hrasva i'
'[93F hex or 2367
decimal]
Selection.Find.Replacement.Text = "^0219" 'ISCII code of Devanagari
matra of vowel "hrasva i"
Selection.Find.Execute Replace:=wdReplaceAll


'(3) etc.



As mentioned above (as a comment in the macro), the character Unicode
Devanagari 'ma' [92E hex or 2350 decimal] isn't found by Microsoft Word
when it is immediately followed (in the file) by a vowel-matra such as
Unicode Devanagari 'matra of hrasva i' [93F hex or 2367 decimal]

But when such character (Unicode Devanagari 'ma') exists singly in the
file [i.e. when it is NOT followed by any vowel matra] it is replaced by
the macro.

So, the command to replace Unicode Consonant with ISCII consonant
sometimes becomes successful and sometimes doesn't-----when the
consonant is present singly in the file, it is successful, and when the
consonant is followed by a vowel matra, it is unsuccessful.

This is a faulty behavior of Microsoft Word. The command should replace
it always, whether or not it is followed by any vowel matra or not.

That is what I am talking about.

[As a matter of fact, the vowel of matra ALSO isn't found when it is
combined with the consonant. It is found only when it appears singly
meaninglessly. (We know that the matra of a vowel cannot appear singly
meaningfully, although it is possible to type a single vowel-matra,
which would have no meaning.)]


In Word, you should be able to search any sequence of consonants and
vowels (whether or niot they are combined), and you might even make
some shortcuts using wildcards, but it isn't entirely clear what
you're trying to do.

You cannot.

When any Devanagari Unicode consonant appears singly in the file, then
you can search for it using Find command (either through UI or
programmatically), but when it is followed by a vowel sign, it is NOT
found by Microsoft Word (neither through UI nor programmatically), as
explained above, AND also as explained in my previous message. [Both the
messages use the same examples.]

Is there any option in Microsoft Word which would enable it to find the
consonants embedded in consonants clusters? (or consonant embedded in
consonent+vowel-matra?)

OR should I ask in another way? Has anyone made any Unicode Devanagari
font, which doesn't implement ANY conjuncts------just as it would show
up under Windows 98, where no "Uniscribe" is present and the operating
system wouldn't display any conjuncts-----just plain consonants, vowels
and matras of vowels? (which, of course, isn't readable).

[If anyone has made such a font, then I can solve my problem
temporarily. I would just apply that font to my text, and then run my
macro on it, which will replace all characters now, and my
Unicode-to-ISCII conversion, USING MACRO, would become successful.]
 
P

Peter T. Daniels

I think what you are asking is for the Microsoft engineers to provide
you with a way to neutralize the component that combines the character
codes into the codes that yield the combined glyphs.

I don't think you can get them to do that.

Can you modify a font so that it will not behave like OpenType? so
that the renderer can't "see" that items need to be combined?

Also: I just got back proofs of an article that has examples in
Chinese, Arabic, and Sanskrit script (among others), and in every case
what is printed is only the citation forms of the letters -- no
connections in Arabic, no conjuncts in Sanskrit, no fanqie in Chinese
-- and this, Oxford University Press tells me, was typeset in India!
(The MSWord file supplied to the typesetter by the editor had
everything exactly correct.)

Before you start this quixotic enterprise, are you certain that every
Unicode character has an equivalent in ISCII (which I know nothing
about)? Does ISCII have a separate character for every possible
conjunct akshara with every possible matra, the way Unicode Korean has
a separate character for every possible syllable block?
That seems unlikely ...

Yes, every Unicode (Devanagari) character has an equivalent in ISCII.

This is because, ISCII means "Indian Standard Code for Information
Interchange".
[Ref--Indian Standard Document 13194, Bureau of Indian Standards,
1991.]

The Unicode (Devanagari part) is based on ISCII.

Yes, ISCII has a separate character for every possible conjunct akshara
with every possible matra. Everything in Unicode (Devanagari part) is
there in ISCII.
Does ISCII automatically form conjuncts, or do you have to input the
reduced form alongside the full form of the base character?

No. ISCII doesn't automatically form conjuncts.

ISCII text isn't a readable Devanagari text. The ISCII is simply a
format. ISCII only contains the basic "consonants", "vowels", "matras of
vowels" and "other marks" just as Unicode (Devanagari) does. Showing the
visible conjuncts on the screen is the job of the programmer. ISCII is
exactly similar to Unicode (Devanagari part of Unicode). Just as the
Unicode Devanagari text in Microsoft Word that is stored in Word files
stores only the "consonants", "vowels" and "matras of vowels" etc, and
the conjuncts that are visibly displayed on screen are rendered by a
component of the operating system (Windows) called "Uniscribe", the
ISCII also only contains "consonants", "vowels" and "matras of vowel"
etc. and NOT the conjunct forms.

There is practically one-to-one relationship between the every character
in ISCII and the every character in Unicode (Devanagari) with some
exceptions, but these exceptions can be solved easily.

Since India still very much depends on the old non-Unicode format text
for publishing of books in Devanagari (Desk-Top-Publishing). [because
the publishing softwares like QuarkXPress, Adobe Indesign etc. do not
support Unicode Devanagari text or have only newly introduced it] Indian
people haven't got rid of the OLD non-Unicode format text yet. However,
since Unicode Devanagari text is becoming popular on Internet, web sites
(such as google) and emails, people frequently need to convert
NON-UNICODE text to Unicode and vice versa.

There are many third party softwares in India (such as ISM, Shree-Lipi
and Indica), which provide converters from ISCII to their format
(non-Unicode) and vice versa, and I have purchased many such third party
softwares. Hence I can readily convert ISCII text to any of the popular
non-Unicode format text of India and vice versa, using these softwares
of India, which I have purchased. Because of availability of these
third-party softwares, for me, doing Unicode-to-non-Unicode conversion
(and vice versa) is equivalent to doing ISCII-to-Unicode conversion (and
vice versa).

Of course, these softwares also provide ISCII-to-Unicode and vice versa
conversion (using non-VBA programming), but I want to do my own
ISCII-to-Unicode and vice versa conversion, using my own Microsoft Word
macros, because their conversions aren't perfect and secondly, I have
successfully replaced many of their conversion tools with my own VBA
Word macros, which I want to do here also.

I have already successfully created an ISCII-to-Unicode (Devanagari)
macro in Microsoft Word. But, now, only the reverse direction
macro---Unicode-to-ISCII macro----has to be created, and that is giving
me problems, as described in my previous message.

I have written the forward direction macro (ISCII-to-Unicode macro)
successfully using "Find and Replace" commands. The macro simply issues
the following block of statements repeatedly with different values in
each occurrence.

For example,

'ISCII-to-Unicode macro [forward macro]
'
'WORKS SUCCESSFULLY
'
'
'e.g.
'
'(1)
Selection.Find.Text = "^0204"         'ISCII code of Devanagari
consonant 'ma'
Selection.Find.Replacement.Text = "^u2350"
'Unicode Devanagari consonant 'ma' [92E hex or
2350 decimal]

Selection.Find.Execute Replace:=wdReplaceAll

'(2)
Selection.Find.Text = "^0219"         'ISCII code of Devanagarimatra
of vowel "hrasva i"
Selection.Find.Replacement.Text = "^u2367"
'Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]

Selection.Find.Execute Replace:=wdReplaceAll

'(3) etc.

As mentioned above, this forward macro works perfectly and
successfully.

But the reverse macro gives problems.

'Unicode-to-ISCII macro [reverse macro]
'
'GIVES PROBLEMS
'BECAUSE THE CHARACTER Unicode Devanagari 'ma' [92E hex or 2350
decimal]
'ISN'T FOUND BY MICROSOFT WORD WHEN IT IS IMMEDIATELY FOLLOWED (IN THE
FILE)
'BY A VOWEL-MATRA, SUCH AS Unicode Devanagari 'matra of hrasva i'
'[93F hex or  2367 decimal]
'
'IT IS NOT FOUND BY THE FOLLOWING COMMAND, WHEN IT IS FOLLOWED BY ANY
VOWEL-MATRA.
'
'e.g.
'
'(1)
Selection.Find.Text = "^u2350"  'Unicode Devanagari consonant 'ma'
[92E hex or 2350 decimal]
Selection.Find.Replacement.Text = "^0204" 'ISCII code of Devanagari
consonant 'ma'
Selection.Find.Execute Replace:=wdReplaceAll

'(2)
Selection.Find.Text = "^u2367"        'Unicode Devanagari 'matra of
hrasva i'
'[93F hex or 2367
decimal]
Selection.Find.Replacement.Text = "^0219"  'ISCII code of Devanagari
matra of vowel "hrasva i"
Selection.Find.Execute Replace:=wdReplaceAll

'(3) etc.

As mentioned above (as a comment in the macro), the character Unicode
Devanagari 'ma'  [92E hex or 2350 decimal] isn't found by Microsoft Word
when it is immediately followed (in the file) by a vowel-matra such as
Unicode Devanagari 'matra of hrasva i' [93F hex or 2367 decimal]

But when such character (Unicode Devanagari 'ma') exists singly in the
file [i.e. when it is NOT followed by any vowel matra] it is replaced by
the macro.

So, the command to replace Unicode Consonant with ISCII consonant
sometimes becomes successful and sometimes doesn't-----when the
consonant is present singly in the file, it is successful, and when the
consonant is followed by a vowel matra, it is unsuccessful.

This is a faulty behavior of Microsoft Word. The command should replace
it always, whether or not it is followed by any vowel matra or not.

That is what I am talking about.

[As a matter of fact, the vowel of matra ALSO isn't found when it is
combined with the consonant. It is found only when it appears singly
meaninglessly. (We know that the matra of a vowel cannot appear singly
meaningfully, although it is possible to type a single vowel-matra,
which would have no meaning.)]
In Word, you should be able to search any sequence of consonants and
vowels (whether or niot they are combined), and you might even make
some shortcuts using wildcards, but it isn't entirely clear what
you're trying to do.

You cannot.

When any Devanagari Unicode consonant appears singly in the file, then
you can search for it using Find command (either through UI or
programmatically), but when it is followed by a vowel sign, it is NOT
found by Microsoft Word (neither through UI nor programmatically), as
explained above, AND also as explained in my previous message. [Both the
messages use the same examples.]

Is there any option in Microsoft Word which would enable it to find the
consonants embedded in consonants clusters? (or consonant embedded in
consonent+vowel-matra?)

OR should I ask in another way? Has anyone made any Unicode Devanagari
font, which doesn't implement ANY conjuncts------just as it would show
up under Windows 98, where no "Uniscribe" is present and the operating
system wouldn't display any conjuncts-----just plain consonants, vowels
and matras of vowels? (which, of course, isn't readable).

[If anyone has made such a font, then I can solve my problem
temporarily. I would just apply that font to my text, and then run my
macro on it, which will replace all characters now, and my
Unicode-to-ISCII conversion, USING MACRO, would become successful.]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top