I am counting how often specific digraphs (e.g. er, ou, ought, ow) occur in
the most common 1000 words. Sometimes the digraph will be embedded within a
word and sometimes it occurs at the end.
If my understanding of digraphs is correct, then the only variation you will
need is to separate out digraphs from trigraphs and tetragraphs, so a routine
which allows you to specify a character sequence that may not be followed by
other character(s) should be sufficient.
I also presume that if a word should have two identical digraphs (and I don't
know if any exist), that you would only want to count it once.
That being the case, this UDF should suffice:
To use the UDF, enter a formula of the type:
=SUMPRODUCT(ReCount(range,string_to_find,[string_not_to_follow],[search_entire_string]))
The arguments within brackets [...] are optional.
The [search_entire_string] option will, is set to TRUE, will count multiple
matches within a single string. In other words, looking for "ee" in
beekeeping, the function will return two; otherwise it will return one.
To return all the "ou", you would use:
=SUMPRODUCT(recount(A1:A1000,"ou"))
To return only "ou" not followed by "gh", you would use:
=SUMPRODUCT(recount(A1:A1000,"ou","gh"))
and to count all of the "ou" not followed by "gh" in each word as a separate
entry, you would use:
=SUMPRODUCT(recount(A1:A1000,"ou","gh",TRUE))
As written, the UDF is case insensitive. This could be made an optional
argument, if required.
To enter the VBA code, <alt-F11> opens the VB Editor.
Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and enter the code below into the window that opens:
============================================================
Option Explicit
Function ReCount(s, p, Optional NF = "", Optional g As Boolean = False)
's: the string being searched. It can be a string, _
a cell reference, or a range of cells
'p: the pattern to search for
'NF: an optional pattern that should NOT FOLLOW p
'g: if TRUE, will cause multiple matches within a single string
If NF <> "" Then
p = p & "(?!" & NF & ")"
End If
Select Case VarType(s)
Case Is = vbString
ReCount = Cnt(s, p, g)
Case Is > vbArray
Dim c As Range, i As Long
Dim t
ReDim t(0 To s.Count)
For Each c In s
t(i) = Cnt(c.Text, p, g)
i = i + 1
Next c
ReCount = t
End Select
End Function
Function Cnt(str, sPattern, g)
Dim re As Object
Dim mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
re.Global = g
re.ignorecase = True
Set mc = re.Execute(str)
Cnt = mc.Count
End Function
================================
--ron