Differences in ReadabilityStatistics

P

paxdominus

Greetings,

I'm trying to capture the Flesch-Kincaid Grade Level for a bunch of
documents, which I know how to do, but when comparing it (and all of
the ReadabilityStatistics) programmatically to the pop-up box, the
numbers are noticably different.

RS PopUp Program

Words 740 421
Characters 3221 2078
Paragraphs 24 21
Sentences 48 32
Sen/Para 2.6 1.6
Words/Sen 14.4 13.7
Char/Word 4.1 4.1
Passive 4 0
FRE 79.1 82.5
FKGL 5.1 4.3

It's the exact same document in both places. Obviously, the numbers are
different enough to cause concern, since decisions about how to use the
document are based upon these numbers.

Why are they different? Is there any way to get the actual numbers from
the pop-up box programatically?
 
J

Jay Freedman

I can't tell where your "Popup" and "Program" numbers are coming from,
but I can tell you that Word has several different ways of counting
characters, words, sentences, and paragraphs, and none of them agree.

- You can get the values from the Readability Statistics dialog that
appears at the end of a spelling/grammar check.

- You can look at the results of the Tools > Word Count dialog.

- You can take the .Count values of the various collections in the
object model.

This demo shows how to get those values, and running it against any
moderate-sized document will show how far out of whack the methods
are. Further, it shows that the ratios in the Readability Statistics
often don't match the values they're supposed to be calculated from.

Sub Discrepancies()
Dim msg As String
ActiveDocument.Repaginate

msg = "From ToolsWordCount -----------"
msg = msg & vbCr & "Characters (with spaces):" & vbTab & _
Dialogs(wdDialogToolsWordCount) _
.CharactersIncludingSpaces
msg = msg & vbCr & "Characters (no spaces):" & vbTab & _
Dialogs(wdDialogToolsWordCount) _
.Characters
msg = msg & vbCr & "Words:" & vbTab & vbTab & vbTab & _
Dialogs(wdDialogToolsWordCount).Words
msg = msg & vbCr & "Paragraphs:" & vbTab & vbTab & _
Dialogs(wdDialogToolsWordCount).Paragraphs
msg = msg & vbCr & "Sentences:" & vbTab & vbTab & "n/a"

msg = msg & vbCr & vbCr & _
"From Readability Stats------------"
msg = msg & vbCr & "Characters:" & vbTab & vbTab & _
ActiveDocument.ReadabilityStatistics(2).Value
msg = msg & vbCr & "Words:" & vbTab & vbTab & vbTab & _
ActiveDocument.ReadabilityStatistics(1).Value
msg = msg & vbCr & "Paragraphs:" & vbTab & vbTab & _
ActiveDocument.ReadabilityStatistics(3).Value
msg = msg & vbCr & "Sentences:" & vbTab & vbTab & _
ActiveDocument.ReadabilityStatistics(4).Value

' first value, in parens, is the calculated ratio
' second value is the one returned by the dialog
msg = msg & vbCr & "Sen/Para: " & Format( _
CSng(ActiveDocument.ReadabilityStatistics(4).Value) / _
ActiveDocument.ReadabilityStatistics(3).Value, "(= 0.0)") _
& vbTab & vbTab & _
ActiveDocument.ReadabilityStatistics(5).Value
msg = msg & vbCr & "Words/Sen: " & Format( _
CSng(ActiveDocument.ReadabilityStatistics(1).Value) / _
ActiveDocument.ReadabilityStatistics(4).Value, "(= 0.0)") _
& vbTab & _
ActiveDocument.ReadabilityStatistics(6).Value
msg = msg & vbCr & "Char/Word: " & Format( _
CSng(ActiveDocument.ReadabilityStatistics(2).Value) / _
ActiveDocument.ReadabilityStatistics(1).Value, "(= 0.0)") _
& vbTab & _
ActiveDocument.ReadabilityStatistics(7).Value

msg = msg & vbCr & vbCr & _
"From Object Model---------------"
msg = msg & vbCr & "Characters:" & vbTab & vbTab & _
ActiveDocument.Characters.Count
msg = msg & vbCr & "Words:" & vbTab & vbTab & vbTab & _
ActiveDocument.Words.Count
msg = msg & vbCr & "Paragraphs:" & vbTab & vbTab & _
ActiveDocument.Paragraphs.Count
msg = msg & vbCr & "Sentences:" & vbTab & vbTab & _
ActiveDocument.Sentences.Count

MsgBox msg
End Sub

The case of the Flesch Reading Ease and Flesch-Kincaid Grade Level is
even worse. The published formulas for those scores involve the
average number of syllables per word. Nobody outside Microsoft knows
how Word determines the syllable count and the average -- presumably
it's based on the hyphenation dictionary, but it could be just a
rule-of-thumb estimate or a hardcoded average. You can't trust it.

So my advice is not to rely on any of these methods, especially if the
results will be used to classify documents for any important purpose.
Either get a piece of open-source software (not that I know of any,
but some academic must have written some) that can be calibrated
properly, or do a manual count of a representative sample.

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the
newsgroup so all may benefit.
 
P

paxdominus

Greetings,

Since I figured that the Pop-Up stats were the actual
ReadabilityStatistices(1-10), and since programmatically I'm pulling
those numbers from ReadabilityStatistics(1-10), they should be the
same.

Does the Pop-Up pull it's stats from a different place? If so, then
where do the ReadabilityStatistics come into play?

As far as I know, since I'm using ReadabilityStatistics(1-10), the two
sets of numbers should be the same.
 
J

Jay Freedman

Yes, they should be the same, and they always have been whenever I checked.

My point, though, was that you can't trust any of the numbers Word presents
to you in either the popup or the ReadabilityStatistics object in VBA, and
especially not the Flesch scores.

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top