Apparent bug in CHIINV!

B

Boris

Hi,
I am having some real problems with the CHIINV function in Excel (I am using
Excel 2007 but the same seems to be true of all previous versions of Excel -
at least back to 2002 - too). According to the help file, CHIINV can take
probabilities >= 0 and <= 1 (just as well that it can - though there seems to
be an issue with the 50% probability) and degrees of freedom up to but not
including 10^10 (ie 10 billion). However, when you try to calculate a chiinv
of, say 0.05 probability, and you set the degrees of freedom to anything
above 1806, the formula immediately returns #NUM! Now, 1806 is much much
smaller than 10^10 so should be perfectly legitimate. In my application in
need to generate chiinv for degrees of freedom up to 90000 (still much less
than the 10^10 given in the function reference) and so I am really getting
stuck...
Is this a bug? Is there a workaround? Anyone have any suggestions for what
may be limiting the chiinv degrees of freedom to <=1806 and why this
particular number should be the limiting value?
I would really appreciate any hints (and confirmation that this error is
persistent across platforms and versions of Excel as I have only been able to
try it on a few machines running Excel XP, 2003 or 2007). If anyone knows of
a clever way to work around this apparent problem, that would be extremely
interesting...
Many thanks and best wishes, Boris.
 
B

Boris

Thanks Mike - that's very helpful. In the meantime, I have found that, for
large n, there is a fairly straghtforward linear relationship between n
(sample size) and the inverse chi square at any given probablity (and inverse
chi square at 50% comes out to be n+1 or v+2 where v is degrees of freedom).
However, probably best to actually calculate the chi square velaues I need so
I will definately try the function and have a look at the notes.
Many thanks and best wishes, Boris.
 
J

Jerry W. Lewis

For a more accurate assymptotic approximation, see equation 3a in
http://digital.library.adelaide.edu.au/coll/special//fisher/281.pdf
In that equation, n is degrees of freecom, and x is inverse normal ordinate
for the same probability level. The full expression gives at least 5-figure
accuracy for the median (x=0) when df>=4, increasing to about 10 figure
accuracy by df=100. The farther you go from the median, the slower the
expression converges, but it should be fine for large df unless you are
interested in extreme tail behavior.

The Smith library would give high accuracy everywhere.

Jerry
 
B

Boris

Dear Jerry,
Many thanks for the excellent reference. I have compared the Smith library
with Maple (v 8) and it does indeed do the trick. For my current analysis
(data sets of between 30k and 92k members) the fisher approach at 2.5 and
97.5% is great too but I think I will stick with a linear approximation as it
too gives good results (to several significant figures) in the range I need
it in. An even better and simple fit for n (sample size) >= 1000 seems to be
a simple shifted power fit (y=a*(x-b)^c) with a around 2.7 (depending on the
probbility), b positive and about 17 for the low % and negative and about -15
for the high % point and c about 0.5.... (curve fits have an associated
standard error of around 0.024 and a correlation coeffficient of 1.000000).
However, the much more precise approach in Fisher and Cornish is really
useful.
Many thanks and best wishes, Boris
 
B

Boris

Dear Jerry,
I should have added that I am fitting n-chisq(n-1,0.025) and
chisq(n-1,0.975)-n to n. Could just have easily fitted chisq() against n but
that would give different constants for a, b, and c...
Best wishes, Boris.
 
M

Matt

I have just discovered this chiinv bug myself with some data I was trying to
compute a standard deviation confidence interval. The problem happens as
early as df=782. In fact, you can see a very nice picture of a list of
approximately all the values this function explodes if you list percentiles
in the top row (ie: a1=0.01, b1=a1+0.01, etc to 0.99), then place the
following formula in the next row:
=CHIINV(A$1,ROW()+700).

Fill in the matrix and look at what starts to happen at around row 82
(df=782) and beyond. There are huge sections (especially around 50% like you
mentioned Boris) that destroy the algorithm. If you zoom the spreadsheet out
to 25%, you get a nice picture of the regions you can't use this algorithm.
Either the bug needs to be fixed, or the exclusion regions need to be
identified in the help. :) )

Matt
 
H

Harlan Grove

Matt said:
So, is anyone at Microsoft planning on doing something about these
bugs?
....

Don't plan on it. These have been know and published for over a
decade, so it doesn't seem MSFT feels much need to fix 'em. There are
add-ins that provide more accurate, though slower, alternative
functions. They're your only choice other than waiting a few more
decades for MSFT to get around to fixing them.
 
B

Boris

Hi Matt,
I agree - at the very least, Microsoft should update their hopelessly
inadequate help file entries and list the problem areas explicitly! That, I
would have thought, would be quite simple to do but...
Best wishes, Boris.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top