Regression gives slope of zero when x values smaller than 10^-7

D

David H

I'm trying to do a simple linear regression (y = mx + b) in Excel 2007 using
the Regression function in the Data Analysis Tool Pack. When the x values
are smaller than about 1x10^-7 the output of the calculation is wrong: the
displayed slope is zero, although the trendline equation shows a non-zero
slope. For example, with the data (1e-7,1) ,(3e-7,2), (5e-7,7) ,(7e-7,10)
the Regression calculation produces a zero slope while the trendline shows a
slope of 1.6x10^7. But if I multiply the x & y values by 10 and then rerun
the Regression then the Regression output shows a slope of 1.6x10^7. Why is
Regression malfunctioning when the x values are small, and how can I prevent
this from happening?
 
J

Jan Karel Pieterse

Hi David,
Why is
Regression malfunctioning when the x values are small, and how can I prevent
this from happening?

I'm not saying this is not a bug, but generally speaking, regression analysis
involves summing the squares of the values. Since the difference in magnitude
of your x and y values is huge, this may be the cause of the trouble.

One thing you can do to avoid this is by rescaling your data so they are closer
together and then after the regression, rework the result to match the original
numbers.

Say you multiply your real x's with 1000.

Suppose you get a regression formula in this shape:

y = a + b * x

Because you multiplied your x values with 1000, x is in fact:

x = 1000 * x_real

and thus

x_real = x / 1000

So to translate your regression result to use your real data x_real:

y = a + b * (x_real / 1000)

Or:

y = a + (b / 1000) * x_real

Regards,

Jan Karel Pieterse
Excel MVP
http://www.jkp-ads.com
Member of:
Professional Office Developer Association
www.proofficedev.com
 
D

David H

Jan,
scaling the x values is indeed one of the things I've done in my own work,
but it's a less than ideal solution for my students, many of whom are
relatively inexperienced with Excel and somewhat math-phobic (sigh). And the
problem doesn't seem to be related to the large difference in x and y values,
because if I scale BOTH x and y the correct slope is displayed.
 
J

Jerry W. Lewis

The coefficient of variation (STDEV/AVERAGE) of x values is more relevant for
numerical difficulty than the difference in magnitude of x and y values.

In Excel versions prior to 2007, the ATP regression tool appears to use
LINEST as its calculation engine; what does LINEST do for this problem in
2007? In 2003, LINEST agrees with SLOPE and INTERCEPT to 15 decimal digits.

Jerry
 
J

Jan Karel Pieterse

Hi Jerry,
The coefficient of variation (STDEV/AVERAGE) of x values is more relevant for
numerical difficulty than the difference in magnitude of x and y values.

Duh, I knew I wasn't exactly right, but couldn't remember what the problem was
indeed. Thanks.

Even so, scaling the data back to -say- -1 to +1 before doing the regression
isn't a bad idea.

Regards,

Jan Karel Pieterse
Excel MVP
http://www.jkp-ads.com
Member of:
Professional Office Developer Association
www.proofficedev.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top