Problem doing t-tests in Excel

P

Peter Dunn

Hi all

I am wanting to do a t-test in Excel (for a class I am teaching). So
as a practice, I did this (in columns A and B):

34 13
35 23
23 29
13 200000
14 200000
56 12
43

These are my two groups, with means:

31.14285714 66679.5

Certainly the means seem different!

However, the ttest procedure gave me this p-value:
0.174751284

The formula was this:
=TTEST(A1:A7,B1:B6,2,3)

What on earth is going on? Can anyone explain? Can anyone suggest the
correct approach?

P.
 
C

Carl Witthoft

Do you know anything about statistical analysis? What sort of class
are you teaching?

My point is: you should know how to calculate a t-test "manually' and
know what sort of p- values are significant. (and what the p-value
actually indicates).

After that, at least you'll know whether your data is pathological, or
whether Excel has a bug, or the answer is correct.
 
M

Mike Middleton

Peter Dunn -
What on earth is going on? Can anyone explain? <

I'm not sure, but I speculate that your values are so disparate that Excel's
algorithms are encountering numerical errors and returning spurious results
(instead of returning an error code).
Can anyone suggest the correct approach? <

I suggest using data sets that have sample means that are within several
standard deviations of each other (instead of obviously very different
sample means where it's clear that "Certainly the means seem different!").

And, pedagogically, I suggest charting the data before crunching the
numbers. For example, you could insert a column of 1's to the left of the
first data set and insert a column of 2's to the left of the second data
set. Create an XY (Scatter) chart type for the first data pairs and then add
the second set of data pairs to the chart. The quick result should be a kind
of dot plot that reinforces when and why a t-test for equal population means
might be appropriate. I can send you an excerpt from my Data Analysis Using
MS Excel book if you need more details.

- Mike
http://www.MikeMiddleton.com
 
P

Peter Dunn

Do you know anything about statistical analysis?   What sort of class
are you teaching?

Yes I do know stats; quite a lot. (Which is why I never usually use
excel for my stats :->)
My point is:  you should know how to calculate a t-test "manually'  and
know what sort of p- values are significant.    (and what the p-value
actually indicates).

Well, of course I know this (I've been teaching this stuff at
university for over 20 years). If you don't know the answer, just say
so, or don't post at all.
After that, at least you'll know whether your data is pathological, or
whether Excel has a bug, or the answer is correct.

That's right; I agree. I suspect there is a bug, but maybe it is my
ignorance in using Excel for this. I was wondering if I was doing
something wrong: I use R frequently, SPSS occasionally, and have never
used Excel for stats before. (Perhaps I now know why...)

Further: Even when I copy the data from the Excel Help example, I get
the incorrect P-value. The Help says P=0.196; pasting the same data
and code into Mac Excel I get 0.248. Using R on the same example data
and doing the equivalent paired t-test (R being my usual choice of
software), I get 0.196, agreeing with the help and disagreeing with my
Excel output.

Something is wrong with Mac Excel it would seem. But I can find no
other similar reports, which makes me think perhaps I am at fault.

P.
 
M

Mike Middleton

Peter Dunn -
Further: Even when I copy the data from the Excel Help example, I get the
incorrect P-value. The Help says P=0.196; pasting the same data and code
into Mac Excel I get 0.248. <

For me, using the Excel Help data set, the TTEST function returns 0.196 both
with Mac Excel 2004 (version 11.5.8) and Mac Excel 2008 (version 12.2.4).

- Mike
http://www.MikeMiddleton.com
 
B

Bob Greenblatt

Peter Dunn -


I'm not sure, but I speculate that your values are so disparate that
Excel's algorithms are encountering numerical errors and returning
spurious results (instead of returning an error code).


I suggest using data sets that have sample means that are within several
standard deviations of each other (instead of obviously very different
sample means where it's clear that "Certainly the means seem different!").

And, pedagogically, I suggest charting the data before crunching the
numbers. For example, you could insert a column of 1's to the left of
the first data set and insert a column of 2's to the left of the second
data set. Create an XY (Scatter) chart type for the first data pairs and
then add the second set of data pairs to the chart. The quick result
should be a kind of dot plot that reinforces when and why a t-test for
equal population means might be appropriate. I can send you an excerpt
from my Data Analysis Using MS Excel book if you need more details.

- Mike
http://www.MikeMiddleton.com
One of the things I would suspect is how the data got into Excel. Are
the numbers really numbers? Might one or more of the be text? Multiply
all the cells by 1 to force the text to numeric values and try again.
Are the results any different?
 
C

Carl Witthoft

Thanks -- I appreciate the detailed info. I guess I'm just getting
jaded in this group, as most people who post "it doesn't work" questions
don't know the difference between sqrt(2) and 1.414 :-( .

Anyway, I get the same p as you did in Excel v.X

I then ran t.test() in R on the same data:

Rgames> tx = c(34,35,23,13,14,56,43)
Rgames> ty=c(13,23,29,2e5,2e5,12)
Rgames> t.test(tx,ty)

Welch Two Sample t-test

data: tx and ty
t = -1.5809, df = 5, p-value = 0.1748
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-175023.17 41726.46
sample estimates:
mean of x mean of y
31.14286 66679.50000

So it sure looks to me like Excel is at least consistent with R. Do
you have any other tools (or, god forbid, pencil and paper :) ) which
give a different result for p? If not, then my suspicion is that this
is simply a pathological set. You know how p-values don't always mean
what people want them to mean.

The Help example in v.X is
=TTEST({3,4,5,8,9,1,2,4,5},{6,19,3,2,14,4,5,17,1},2,1)
And I get exactly the same answer. You may want to double-check the
Tails and Type values you're using.

Personally, I think two values out in left field with all other values
in a very reasonable range should lead to a p-value that is neither
vanishingly small nor optimistically large.

Carl
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top