Advanced string comparison

P

Pablo Cardellino

Hi,



I'm writing my own proofreading tool, because some Portuguese orthographic
rules are going to change in January 1st 2k9, and it seems Microsoft will
not distribute a new dictionary for old versions of Word (just for 2007). So
I need to compare words (simple words and compound words - two or more
words with hyphens).



I'm having some difficulty for performing the string comparisons. For
example, a new rule for compounding words is that when the former word ends
with certain vowel and the latter one begins with a different vowel, we must
simply join them, without hyphen:



Until 2008 we have written "contra-ordem"; from 2009 on, we'll write
"contraordem"



But if the vowel is the same, the rule is right the opposite:



Until 2008: "contraataque"; from 2009 on: "contra-ataque"



I'm not achieving to model these type of strign comparision. How can I code
an expression for validating whether the vowels surrounding the Hyphen are
equal or not?



Thanks in advance,



Pablo Cardellino
 
H

Helmut Weber

Hi Pablo,
Until 2008 we have written "contra-ordem"; from 2009 on, we'll write
"contraordem"
not a problem, I'd say,
if there are otherwise no sequences
of vowel dash another vowel in Portuguese
or in your texts at all.
But if the vowel is the same, the rule is right the opposite:
Until 2008: "contraataque"; from 2009 on: "contra-ataque"

Well, if there are otherwise no words in Portuguese
which contain a vowel followed by the same vowel,
not a problem either. But, as I think,
this might likely be possible, I'm waiting for
your confirmation that this isn't so before trying a solution.

--

Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Vista Small Business, Office XP
 
P

Pablo Cardellino

Hi Helmut,
not a problem, I'd say,
if there are otherwise no sequences
of vowel dash another vowel in Portuguese
or in your texts at all.

I'm sorry, I'm afraid I omitted an important information: there are all
possible pairs of ending vowel + beginning vowel compound names, but not all
these words match these rules. For example, "abelha-operária" (wich stands
for worker bee) will be written from now on "abelha operária". The rules on
wich I'm trying to work now take effect only when the former element is one
of a limited list of prefixes, such as "anti-", "arqui-", "auto-", "extra-",
"semi-", "contra-", "vice-"

Well, if there are otherwise no words in Portuguese
which contain a vowel followed by the same vowel,
not a problem either. But, as I think,
this might likely be possible, I'm waiting for
your confirmation that this isn't so before trying a solution.

The same for this case: this rule only affect the words formed by the
vowel-ending-prefix + vowel-beginning-word
Greetings from Bavaria, Germany

Regards from Florianópolis, Brazil

Pablo Cardellino
EN > ES <> PT Translator
 
H

Helmut Weber

Hi Pablo,

the following assumes, that the spelling is consistently
the older spelling. If all ways of spelling are mixed,
I think, you are at a loss.

Sub Test444a()
Dim MyArr() As String
Dim Chr1 As String
Dim Chr2 As String
Dim rDcm As Range
Dim lCnt As Long
Set rDcm = ActiveDocument.Range

MyArr = Split("anti-arqui-auto-extra-semi-contra-vice", "-")
For lCnt = 0 To UBound(MyArr)
With rDcm.Find
.Text = "<" & MyArr(lCnt) & "-[a,e,i,o,u]"
.MatchWildcards = True
While .Execute
rDcm.Select
' Stop
Chr1 = rDcm.Characters.Last.Previous.Previous
Chr2 = rDcm.Characters.Last
If Chr1 <> Chr2 Then
rDcm.Characters.Last.Previous.Delete
rDcm.HighlightColorIndex = wdYellow
rDcm.Collapse Direction:=wdCollapseEnd
rDcm.End = ActiveDocument.Range.End
End If
Wend
End With
Next
' -----------------------------
Set rDcm = ActiveDocument.Range
For lCnt = 0 To UBound(MyArr)
With rDcm.Find
.Text = "<" & MyArr(lCnt) & "[a,e,i,o,u]"
.MatchWildcards = True
While .Execute
rDcm.Select
' Stop
Chr1 = rDcm.Characters.Last.Previous
Chr2 = rDcm.Characters.Last
If Chr1 = Chr2 Then
rDcm.Characters.Last = "-" & Chr2
rDcm.HighlightColorIndex = wdYellow
rDcm.Collapse Direction:=wdCollapseEnd
rDcm.End = ActiveDocument.Range.End
End If
Wend
End With
Next
' -----------------------------
Set rDcm = ActiveDocument.Range
For lCnt = 0 To UBound(MyArr)
With rDcm.Find
.Text = "[a,e,i,o,u]-[a,e,i,o,u]"
.MatchWildcards = True
While .Execute
rDcm.Select ' for testing
Stop ' for testing
If rDcm.HighlightColorIndex = wdNoHighlight Then
rDcm.Text = Replace(rDcm.Text, "-", " ")
rDcm.Collapse Direction:=wdCollapseEnd
rDcm.End = ActiveDocument.Range.End
End If
Wend
End With
Next
End Sub

You may remove the highlighting by:
ActiveDocument.Range.HighlightColorIndex = wdNoHighlight

Good luck.
--

Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Vista Small Business, Office XP
 
P

Pablo Cardellino

Thanks Helmut,

I won't use the code because of the approach, which is different: I drew a
form similar to the built-in proofreading tool form, containing the example
of the misspelled word in context, an alternate spelling and some buttons:
replace, replace all, ignore, ignore all, add to dic (which is a word table
inside a system .doc). So the tool walks the whole document text, evaluating
each word and finding whether it matches some rule or not. The tool includes
a dictionary which is the first comparison: if the word matches some entry
in the left column, the respective right column suggestion is given to the
user. It is the user who allways have to decide (but, for certain entry,
he/she could press the replace all button, and I'm going to draw an "Accept
all suggestions" button too).

Well, won't use your whole code, but I got the idea for the match, which
will work. Thank you very much. I'll make another question, but I'll better
begin a new thread.

Best regards,

Pablo


Helmut Weber said:
Hi Pablo,

the following assumes, that the spelling is consistently
the older spelling. If all ways of spelling are mixed,
I think, you are at a loss.

Sub Test444a()
Dim MyArr() As String
Dim Chr1 As String
Dim Chr2 As String
Dim rDcm As Range
Dim lCnt As Long
Set rDcm = ActiveDocument.Range

MyArr = Split("anti-arqui-auto-extra-semi-contra-vice", "-")
For lCnt = 0 To UBound(MyArr)
With rDcm.Find
.Text = "<" & MyArr(lCnt) & "-[a,e,i,o,u]"
.MatchWildcards = True
While .Execute
rDcm.Select
' Stop
Chr1 = rDcm.Characters.Last.Previous.Previous
Chr2 = rDcm.Characters.Last
If Chr1 <> Chr2 Then
rDcm.Characters.Last.Previous.Delete
rDcm.HighlightColorIndex = wdYellow
rDcm.Collapse Direction:=wdCollapseEnd
rDcm.End = ActiveDocument.Range.End
End If
Wend
End With
Next
' -----------------------------
Set rDcm = ActiveDocument.Range
For lCnt = 0 To UBound(MyArr)
With rDcm.Find
.Text = "<" & MyArr(lCnt) & "[a,e,i,o,u]"
.MatchWildcards = True
While .Execute
rDcm.Select
' Stop
Chr1 = rDcm.Characters.Last.Previous
Chr2 = rDcm.Characters.Last
If Chr1 = Chr2 Then
rDcm.Characters.Last = "-" & Chr2
rDcm.HighlightColorIndex = wdYellow
rDcm.Collapse Direction:=wdCollapseEnd
rDcm.End = ActiveDocument.Range.End
End If
Wend
End With
Next
' -----------------------------
Set rDcm = ActiveDocument.Range
For lCnt = 0 To UBound(MyArr)
With rDcm.Find
.Text = "[a,e,i,o,u]-[a,e,i,o,u]"
.MatchWildcards = True
While .Execute
rDcm.Select ' for testing
Stop ' for testing
If rDcm.HighlightColorIndex = wdNoHighlight Then
rDcm.Text = Replace(rDcm.Text, "-", " ")
rDcm.Collapse Direction:=wdCollapseEnd
rDcm.End = ActiveDocument.Range.End
End If
Wend
End With
Next
End Sub

You may remove the highlighting by:
ActiveDocument.Range.HighlightColorIndex = wdNoHighlight

Good luck.
--

Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Vista Small Business, Office XP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top