txt parser



Hi Folks,

I need routine in vba that counts the number of occurency of all the words
in e txt document and creates e txt report with a list of all the words of
the text and the number of occurencies for each word.

Could you help me to find out a solution?

Thank you very much

Fumei2 via OfficeKB.com

What have you tried so far? Please post any code you have come with.

Doug Robbins - Word MVP

Sub WordFrequency()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '

' Set up excluded words
' Excludes =
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are excluded:
" & Excludes & ". Enter words that you wish to exclude, surrounding each
word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If
Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.BuiltInDocumentProperties(wdPropertyWords)
' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord < "A" Or SingleWord > "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword

' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And
Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j

' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
Next j
End With
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment =
1).Range.InsertBefore "Total words in Document"
2).Range.InsertBefore Totalwords
1).Range.InsertBefore "Number of different words in Document"
2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words
", vbOKOnly, "Finished")
Selection.HomeKey wdStory

End Sub

Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP, originally posted via msnews.microsoft.com


Karl E. Peterson said:
Doug Robbins - Word MVP used his keyboard to write :

Think Bernye will get an A for that? ;-)

For What?
Well, I'd want to parse an english book (txt) to build a statistics of the
number of word occurences.
I don't speak english very well so I'd want to learn it by watching tv,
speaking with mother tongue people and.... reading books.
If I build a list of the more frequent words apperaring in the book, i can
read it more fastly and i can fix better in my mind those words.

I ve already wrote VBA code to my technical specific purpose and I'm going
to parse the vba code to understand how it works!

*****Thank you very much!!!*****

I will let you know the results of my analysis!!

Merry Christmas to all!

Doug Robbins - Word MVP

Karl may have thought that you asked the question in connection with a
school homework exercise

The ;-) at the end of his post is a smiley for wink. That is, his
suggestion was in jest.

Sorry if the above introduces a whole new set of English terms for you to

While English is my mother tongue, having spent many years in countries
where English is not the language, I do realise how difficult a language it
is for non-English speaking people to master.

Good luck with your attempt.

Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP, originally posted via msnews.microsoft.com


Doug Robbins - Word MVP said:
Sub WordFrequency()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique

It works very well!
It's what I need.
Thanks a lot.


Karl E. Peterson

Doug Robbins - Word MVP used his keyboard to write :
Karl may have thought that you asked the question in connection with a school
homework exercise

Guilty. It sure sounded like one to me.
The ;-) at the end of his post is a smiley for wink. That is, his
suggestion was in jest.

Right, thanks...

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question
