LINEST bug in Excel 2003

M

mathman

I have used LINEST to provide polynomial coefficients in Excel97 through
Excel 2002 with no problems. My company has loaded Excel 2003 and now all the
existing spreadsheets give the wrong answers but no error messages. I have
read the KB articles on LINEST in 2003 and none give a clue to the problem.

I use two columns of data for x and y with in this example a row of 7 cells
highlighted with this array formula:

=LINEST(AN3:AN11,AM3:AM11^{1,2,3,4,5,6})

Excel 97 -2002 dutifully reports 7 cells with the coeffs of
X6,X5,X4,X3,X2,X1 & X0
Excel 2003 sets X^6 to zero regardless of the data set and reports a wildly
wrong value for the constant.

I have tested this on a clean build of Excel 2003 with the Analysis Tool
pack loaded as well as several of our 8000 corporate desktops. I cannot find
any reference to this insidious problem on the internet, has anyone else
experienced it?
 
M

Mike Middleton

mathman -

I think you should also be concerned about the "insidious problem" you may
encounter from overfitting the data when using a sixth-order polynomial.

- Mike
www.mikemiddleton.com
 
M

mathman

Thanks for trying to help, I can only find one post from Jerry on 5/6/2005
under the thread "simple statistical analysis" this did not seem relevant but
perhaps I missed the point.
 
M

mathman

Thanks for your interest - the real data has many more points the ranges were
just an example. The problem of missing first coeff applies to 3rd order 4th
order 5th order etc.

I have tried multiple linear fitting and that fails in the same way in Excel
2003 but works perfectly in Excel 97, Excel 2000 & Excel 2002
 
J

Jerry W. Lewis

Select 7 columns in a row and array enter (Ctrl-Shift-Enter) the
following formula to reproduce the pre 2003 calculation, except that the
coefficients will be given in reverse order (intercept, linear, ... 6th
power).

=TRANSPOSE(MMULT(MINVERSE(MMULT(TRANSPOSE(xCol^{0,1,2,3,4,5,6}),
xCol^{0,1,2,3,4,5,6})), MMULT(TRANSPOSE(xCol^{0,1,2,3,4,5,6}), yCol)))

Note the explicit zero power in this formula.

In addition to the potential for overfitting the data, I would be
concerned about numerical stability of the calculation (the reason that
Excel 2003 changed its approach).

http://groups-beta.google.com/group...3d6f9a19fe8/6c916351b5101102#6c916351b5101102

Plot the data as an "XY (Scatter)" chart and add a 6th degree polynomial
trendline (format to scientific notation with 14 decimal places); if
there is an appreciable difference between the graph and formula
coefficients, then the graph coefficients are better.

David Braden has posted code to directly extract coefficients from the
displayed chart trendline equation into cells

http://groups.google.com/[email protected]

Jerry
 
H

Harlan Grove

Jerry W. Lewis wrote...
....
Plot the data as an "XY (Scatter)" chart and add a 6th degree polynomial
trendline (format to scientific notation with 14 decimal places); if
there is an appreciable difference between the graph and formula
coefficients, then the graph coefficients are better.

David Braden has posted code to directly extract coefficients from the
displayed chart trendline equation into cells
....

Meaning that Microsoft is still using different pieces of code for
LINEST and chart trend lines? And the older, trend line code is
better? If so, it begs the question why Microsoft squandered their
money changing the LINEST code rather than just using the trend
line code for LINEST.
 
J

Jerry W. Lewis

You are correct, LINEST and chart trendline are not doing exactly the
same thing. I don't know why MS reinvented the wheel here, but it is
not the first example of redundant Excel functions which are programmed
differently.

It has been a while since I did the comparisons, but my vague
recollections are that:

- Chart is slightly better numerically than 2003 LINEST (when no
coefficient estimates are exactly zero)

- 2003 LINEST (when no coefficient estimates are exactly zero) is
comparable to lm() in S-PLUS and R, and thus is far superior to previous
versions of LINEST (except for the anomaly under discussion).

- 2003 LINEST appears to zero "small" coefficients, much like the
difference between =227.82-(227+0.75+0.05+0.02) and
=(227.82-(227+0.75+0.05+0.02)). Thankfully MS did not introduce this
"improvement" into the chart trendline.

- The nonzero coefficients from 2003 LINEST are correct for the full
problem, so you might be able (a new [untested] idea) to fit the terms
that gave zero coefficients to the residuals from the original 2003
LINEST fit, to restore the arbitrarily zeroed coefficients.

Jerry
 
J

Jerry W. Lewis

Jerry W. Lewis said:
It has been a while since I did the comparisons, but my vague
recollections are that: ....
- The nonzero coefficients from 2003 LINEST are correct for the full
problem, so you might be able (a new [untested] idea) to fit the terms
that gave zero coefficients to the residuals from the original 2003
LINEST fit, to restore the arbitrarily zeroed coefficients.

Limited testing suggests that when there are coefficients that are exactly
zero, coefficients OTHER THAN THE ASSUMED INTERCEPT TERM (apparently
calculated by subtraction at the center of data) are correct. If you
subtract off the fitted cubic and linear terms from the OP's y-data and then
fit a quadratic (with intercept) without a linear term, then you get the
correct missing coefficients.

Jerry
 
H

Harlan Grove

Jerry W. Lewis wrote...
....
Limited testing suggests that when there are coefficients that are exactly
zero, coefficients OTHER THAN THE ASSUMED INTERCEPT TERM (apparently
calculated by subtraction at the center of data) are correct. If you
subtract off the fitted cubic and linear terms from the OP's y-data and then
fit a quadratic (with intercept) without a linear term, then you get the
correct missing coefficients.

You've already acknowledged that Microsoft reinvented the wheel here.
It's almost funny that they seem to have come up with yet another
approach not as good as the one they use in chart trendlines. Or do you
mean that XL2003 LINEST is more accurate than chart trendlines for the
coefficients it happens to get right on the first pass?
 
J

Jerry W. Lewis

My recollection is that the chart is slightly more accurate for very
ill-conditioned polynomials such as
http://groups.google.com/groups?selm=cEe%[email protected]

I am not party to the code used by the chart polynomial trendline (if I
were I would probably be enjoined by NDA from discussing it). It is
possible that makes use of the fact that the model is a polynomial,
which would be inappropriate for the more general LINEST function. At
this point, I would be thrilled to get a LINEST that is not subject to
the cancellation problems inherent in forming the normal equations
(pre-2003) and does not mistakenly zero part of the solution.

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top