Excel and the Math Coprocessor for DLLs

L

Lynn McGuire

Okay. My mistake.

There is a difference(!) when calling InitTestDLL from XL2003 VBA v. XL2010 VBA. And I misinterpreted the results from XL2003 VBA.

When InitTestDLL is called from XL2003 VBA, the last MsgBox displayed by checkMathCoprocessorStatus says "was 0x127f" and "is now
0x127f" the first time.

But subsequent times, it says "was 0x137f" and "is now 0x137f".

That led me to conclude that your DLL is changing the FPU control word in some way completely unrelated to the code we see in your
testdll.c.

But when InitTestDLL is called from XL2010 VBA, we see "was 0x137f" and "is now 0x137f" for the first call and for all subsequent calls.

That is consistent with the code in your testdll.c

Ergo, the change that I see with XL2003 VBA is probably due to VBA, not your code.

(Although I had used XL2003 for my initial test of your code, I had switched to XL2010 unconsciously for my subsequent testing. Mea
culpa!)

Based on Martin's assumption that your version of _control87 is displaying the actual FPU control word, 0x127f corresponds to _PC_53
+ _RC_NEAR, and 0x137f corresponds to _PC_64 + _RC_NEAR.

That would indeed explain the results that we observe with your testdll.dll, namely: chptst is zero the first time, but about
2.85E-10 subsequent times when using XL2003.

But when using XL2010, you see about 2.85E-10 consistently.

As Martin and I have said repeatedly, the remedy is for you to call _control87(_PC53 + _RC_NEAR, _MCW_PC + _MCW_RC) at the
__beginning__ of InitTestDLL.

(Technically, _RC_NEAR and _MCW_RC are not needed since it appears that _RC_NEAR is already set. But it is good "defensive
programming" to set both modes.)

Alternatively, as I explained previously, you can call a DLL routine directly from the VBA procedure CommandButton1_Click. The DLL
routine would call _control87(_PC53 + _RCNEAR, _MCW_PC + _MCW_RC).

That would obviate the need to call _control87 from each DLL routine that you might call from CommandButton1_Click.

But you would need to call that DLL routine from every VBA procedure that you initiate, either with an Excel "button" or by calling a
VBA function from an Excel formula.

So arguably, it is more reliable to call _control87 (or a DLL routine) from each entry point that allow to be called from VBA.

Done. No better calcs.

The Math Coprocessor status is 0x4020 and control is 0x137f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f

I call the following code from each entry point.

unsigned checkMathCoprocessorStatus ()
{
unsigned old87Status = 0;
unsigned old87Control = 0;
unsigned new87result = 0;

old87Status = _status87 ();
old87Control = _control87 (0, 0);
//if (old87Status != 0)
{
char msg [4096];
sprintf (msg, "\nThe Math Coprocessor status is 0x%x and control is 0x%x\n\n", old87Status, old87Control);
writeLineToScreen (msg);
// new87result = _control87 (_PC_64 + _RC_NEAR, _MCW_PC + _MCW_RC);
// new87result = _control87 (_PC_53, _MCW_PC);
// new87result = _control87 (_PC_64, _MCW_PC);
// new87result = _control87 (_RC_NEAR, _MCW_RC);
// _asm { FINIT };
//new87result = _control87 (0x27f, 0xffff);
//new87result = _control87 (0, 0);
new87result = _control87 (_PC_53 + _RC_NEAR, _MCW_PC + _MCW_RC);
sprintf (msg, "\nThe Math Coprocessor status is is now 0x%x\n\n", new87result);
writeLineToScreen (msg);
}

return old87Status;
}

Thanks,
Lynn
 
J

joeu2004

Lynn McGuire said:
Done. No better calcs.

What does that mean exactly?

That is, what is chptst each time?

At the very least, it should be consistent. Is it?


Lynn McGuire said:
The Math Coprocessor status is 0x4020 and control is 0x137f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f

I call the following code from each entry point.

What does that output represent from the VBA side?

That is, are you calling 3 DLL routines (or the same routine 3 times) in a
single "execution" (e.g. a single click of CommandButton1_Click)?

Or are you doing 3 "executions" (e.g. clicking CommandButton1_Click 3
times)?

-----

Finally, as interesting as all this might be academically, what use is this
to you practically ultimately?

F'getabout these experiments, interesting as they might be?

What is the final application?

Is your ultimate purpose indeed to ferret the FP behavior of VBA and/or
Watcom, fine.

Or you doing computations that demand accuracy to 53 bits (mantissa), for
example some scientific application?

Otherwise, do you understand that none of this should matter to you? (Other
than as a curiosity.)

Most properly-designed algorithms with FP computations, e.g. business
algorithms, should be immune to the details of the precision and rounding in
the least-significant bit, unless you are dealing with trillions (US) of
dollars, for example.

Do you understand that?

Do you understand that such "properly-designed" algorithms should explicitly
round most numerical computations that involve non-integer values?

Is there any reason why that is not a good solution for your purposes?
 
M

Martin Brown

Done. No better calcs.

OK. There is something odd here. The link map for your DLL suggests that
the code includes the FPU emulator emu387.lib code as well (which AFAIR
was obsoleted more than a decade ago). It is entirely possible that the
387 emulator cuts enough corners that it cannot compute the FPU test sum
right even when asked to do so.

Incidentally your compiler has optimised *at compile time* the original
division that you think you are testing to a multiplication by the
reciprocal. It is a technique called strength reduction.

That is (X/Y)*Y - X was compiled as X * (1/Y) * Y - X
where the value (1/Y) was computed by the compiler and not by the FPU!
(almost any modern optimising compiler will do this because at runtime
an FP division is significantly more expensive than a multiply)

MS Visual C goes further an spots that the result is identically zero.
Optimiser on it compiles to FLDZ.

What you are seeing here could well be a defect in how the *compiler*
optimises expressions since it will quite likely use a 64bit mantissa
here. I now think that is the root cause of your "problem".
The Math Coprocessor status is 0x4020 and control is 0x137f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f

OK. The math coprocessor is now in the right state. With this change
does your fluid mechanics code now give the right expected answers?
I call the following code from each entry point.

unsigned checkMathCoprocessorStatus ()
{
unsigned old87Status = 0;
unsigned old87Control = 0;
unsigned new87result = 0;

old87Status = _status87 ();
old87Control = _control87 (0, 0);
//if (old87Status != 0)
{
char msg [4096];
sprintf (msg, "\nThe Math Coprocessor status is 0x%x and control is
0x%x\n\n", old87Status, old87Control);
writeLineToScreen (msg);
// new87result = _control87 (_PC_64 + _RC_NEAR, _MCW_PC + _MCW_RC);
// new87result = _control87 (_PC_53, _MCW_PC);
// new87result = _control87 (_PC_64, _MCW_PC);
// new87result = _control87 (_RC_NEAR, _MCW_RC);
// _asm { FINIT };
//new87result = _control87 (0x27f, 0xffff);
//new87result = _control87 (0, 0);
new87result = _control87 (_PC_53 + _RC_NEAR, _MCW_PC + _MCW_RC);
sprintf (msg, "\nThe Math Coprocessor status is is now 0x%x\n\n",
new87result);
writeLineToScreen (msg);
}

return old87Status;
}
eg

void checkMathCoprocessorStatus (unsigned &old87Status, unsigned
&old87Control)
{
old87Status = _status87 ();
old87Control = _control87 (0, 0);
}

unsigned setMathCoprocessorStatus (unsigned value, unsigned mask)
{
unsigned old87Control = _control87 (value, mask);
return old87Control
}

To be able to help further strip out every last thing that isn't
absolutely essential from the code and make all routines individually
callable from VBA. Also disable all compiler optimisations and tell it
to assume an FPU is fitted (this alone may fix your problem once the
right rounding mode is enabled and that seems to have worked OK).

Also make the value "Y" used in the (X/Y)*Y-X expression parameters to
the calling routine - that way the compiler cannot pre compute (1/Y)
 
J

joeu2004

Martin Brown said:
There is a difference(!) when calling InitTestDLL from
XL2003 VBA v. XL2010 VBA. And I misinterpreted the results
from XL2003 VBA.
[....]
This is also the behaviour in XL2007 so it looks like
XL2010 is the point where MS went back to using full
64 bit mantissas (80 bit reals).

I don't believe so.

Normally, XL2003 VBA does compute using _PC_64, just as XL2007 and XL2010
VBA do.

However, I did not uncover a defect in XL2003 VBA that is reproducible
nearly 100% of the time.

Inexplicably, after we import a module into XL2003 VBA (SP3), the FPU
control word is set to _PC_53. And apparently it remains in that state
until we change it by calling _controlfp ourselves.

That does not happen in XL2007 and XL2010.

I say "nearly 100% of the time" because I believe there were a couple times
out of 20 or 30 when that did not happen; that is, when the precision mode
continued to be _PC_64.

Be that as it may, that has nothing to do with Lynn's problems.

You seem to be doing a great job of disassembling his DLL and ferreting out
the root cause.

So I will leave you to it.
 
M

Martin Brown

Martin Brown said:
There is a difference(!) when calling InitTestDLL from
XL2003 VBA v. XL2010 VBA. And I misinterpreted the results
from XL2003 VBA.
[....]
This is also the behaviour in XL2007 so it looks like
XL2010 is the point where MS went back to using full
64 bit mantissas (80 bit reals).

I don't believe so.

Normally, XL2003 VBA does compute using _PC_64, just as XL2007 and
XL2010 VBA do.

However, I did not uncover a defect in XL2003 VBA that is reproducible
nearly 100% of the time.

Inexplicably, after we import a module into XL2003 VBA (SP3), the FPU
control word is set to _PC_53. And apparently it remains in that state
until we change it by calling _controlfp ourselves.

That does not happen in XL2007 and XL2010.

That is odd though. For me here on XL2007 it does.

I am using XL2007 SP3 here and I see exactly consistent behaviour with
control word set to _PC_53 on entry just like with XL2003. It gets the
"right" answer the first time through but fails on subsequent goes when
the control word has been reset to _PC_64. I think by his application.

One way to tell would be to set _PC_24 and see if you get handed back
the FPU in that parlous state the second time around.
I say "nearly 100% of the time" because I believe there were a couple
times out of 20 or 30 when that did not happen; that is, when the
precision mode continued to be _PC_64.

Be that as it may, that has nothing to do with Lynn's problems.

You seem to be doing a great job of disassembling his DLL and ferreting
out the root cause.

So I will leave you to it.

I am curious now as to what is actually going on.

The alleged "fault" should have no bearing on the DLLs computational
accuracy since _PC_64 maintains additional guard digits beyond what
normal _PC_53 REAL*8 computations can manage. The final results may
differ slightly but a 64 bit mantissa should give better results!
 
J

joeu2004

Martin Brown said:
That is odd though. For me here on XL2007 it does.

I am using XL2007 SP3 here and I see exactly consistent behaviour with
control word set to _PC_53 on entry just like with XL2003. It gets the
"right" answer the first time through but fails on subsequent goes when
the control word has been reset to _PC_64. I think by his application.

(FYI, I am using XL2007 SP2. I will have to look into updating to SP3.)

Yes, "by his application".

Please note that the behavior that I described is __independent__ of Lynn's
DLL.

I agree that something hinky is happening with Lynn's DLL, which not only
includes Watcom C modules, but also Watcom F77 modules.

Instead, I am using the following VBA module to infer the state of the FPU.

Sub testit2()
Const factor As Double = 2 ^ -24 + 2 ^ -53 + 2 ^ -55
Dim z As Double, x As Double
z = 1 + factor + factor
x = 1 + factor: x = x + factor
MsgBox "z-1 = " & Format(z - 1, "0.0000000000E+00") & _
vbNewLine & "x-1 = " & Format(x - 1, "0.0000000000E+00")
End Sub

The expected output is:

z-1 = 1.1920928977E-07
x-1 = 1.1920928999E-07

x-1 always emulates the behavior of _PC_53. z-1 demonstrates the state of
the FPU. 1.1...77E-7 is indicative of _PC_64 + _RC_NEAR; 1.1...99E-7 is
indicative of _PC_53 + _RC_NEAR.

Note that 1.1...77E-7 is exactly 2^-23 + 2^-52, whereas as 1.1...99E-7 is
exactly 2^-23 + 2^-51.

-----

Martin Brown said:
I am curious now as to what is actually going on.
The alleged "fault" should have no bearing on the DLLs computational
accuracy since _PC_64 maintains additional guard digits beyond what normal
_PC_53 REAL*8 computations can manage.

That is incorrect.

As demonstrated by "testit2" above, when the FPU state is _PC_53, each FP
operation (e.g. addition, multiplication, divide, negate) is calculated
using the full 64-bit mantissa of the 80-bit register, but then it is
rounded to the 53-bit mantissa according to the FPU state -- _RC_NEAR unless
we change it with a call to _control87.

It is the _PC_53 rounding of each FP operation that makes the difference.
With _PC_64, rounding is done only after all FP operations, when the FP
register is stored into 64-bit memory (type Double).

Note: That assumes that the compiler relies on the FP registers for
intermediate results, as VBA does. If it stores each operation into 64-bit
memory, as I believe Excel does, in effect it is emulating _PC_53 mode
despite the _PC_64 state.

But what is happening in Lynn's DLL is a total mystery. As you seem to have
discovered, the DLL code itself is doing unexpected things. If you are
correct that it uses an FPU __emulator__, that might explain some the
inconsistencies that I discovered, e.g. differences between the result of
"testit2" and Lynn's compilation of InitTestDLL.

However, finding a reference to emu387.lib in the DLL link map does not mean
that that code is being executed. I don't know much about the Windows DLL
architecture, and nothing about Watcom compilations. But the emu387.lib
reference might be invoked somehow automagically only if computer lacks real
FPU support. Otherwise, the emu387.lib might simply be dormant code.

Nonetheless, even if emu387.lib is not the explanation, I am convinced that
Lynn's DLL does unexpected things -- inexplicative things for the time
being. We should not draw over-generalizations about "VBA" behavior based
on how VBA and Lynn's DLL interact.

-----

And apparently it remains in that state
until we change it by calling _controlfp ourselves.

..... Or until we open another workbook in the same Excel instance.

..... Or until we delete a module when at least two workbooks are open in the
same Excel instance -- usually(!).

This is clearly defective behavior of XL2003, not intentional and expected
behavior, IMHO.
 
J

joeu2004

PS.... "Martin Brown said:
The final results may differ slightly but a 64 bit mantissa
should give better results!

I forgot to address this point. I'm really pressed for time....

The result of (x/y)*y-x with the specific chptst constants will be exactly
zero when evaluated __consistently__ in _PC_53 and _PC_64 modes.

That is a coincidence. There are values of x and y for which the result is
not zero in either mode, not always at the same time.

The operative word is "consistently". And I am only talking about the
one-line expression (x/y)*y-x.

Lynn's implementation is not consistent, assuming no "subexpression
elimination" optimization by the compiler.

He computes x/y and stores it (call it z). Effectively, that is the same as
_PC_53 mode.

Then he computes z*y-x.

When the latter is calculated in _PC_53 mode, the result is zero because
that is the same as (x/y)*y-z in _PC_53 mode.

When the latter is calculated in _PC_64 mode, the result is non-zero because
we have mixed modes: _PC_53 rounding for z, but _PC_64 for the rest of the
computation.

In contrast, if Lynn had written (x/y)*y-z in one expression, which is the
correct way to test the FDIV defect, it would result in zero in _PC_64 mode.

Again, all of this is by coincidence; that is, it is specific to the
constants in the chptst code.
 
L

Lynn McGuire

I said:
There is a difference(!) when calling InitTestDLL
from XL2003 VBA v. XL2010 VBA. [....]
When InitTestDLL is called from XL2003 VBA, the last
MsgBox displayed by checkMathCoprocessorStatus says
"was 0x127f" and "is now 0x127f" the first time. But subsequent times, it says "was 0x137f" and
"is now 0x137f". [....]
But when InitTestDLL is called from XL2010 VBA, we
see "was 0x137f" and "is now 0x137f" for the first
call and for all subsequent calls.

Okay, this is all just a little too weird even for me.

The difference in XL2010 VBA that I saw happens when I click on the testdll.xls file icon, which opens with XL2010 on my system.

But when I open XL2010 and open testdll.xls from the Recent Workbooks list, it behaves the same as XL2003 VBA above.

Moreover, despite displaying "was 0x137f" and "is now 0x137f", indicative of _PC_64 if Martin is correct, the chptst test returns
zero the first time, indicative of _PC_53.

But that is inconsistent with the results of the following test, which I put in the beginning of CommandButton1_Click:

Const factor As Double = 2 ^ -24 + 2 ^ -53 + 2 ^ -55
Const z As Double = 1 + factor + factor
Dim x As Double
x = 1 + factor: x = x + factor
MsgBox Format(z - 1, "0.0000000000E+00") & vbNewLine & _
Format(x - 1, "0.0000000000E+00")

The result of x-1 is about 1.1920928999E-7, which emulates _PC_53 precision.

The result of z-1 is about 1.1920928977E-7, which is indicative of _PC_64 precision.

I cannot explain these inconsistencies.

On the other hand, when I compile your testdll.c using VC++, I get consistent results with XL2003 and XL2010 VBA, no matter how I
start testdll.xls (click on the file and open in XL2010, or start XL2003 and XL2010 and open testdll.xls within).

With the recompiled testdll.c, chptst is always about 2.85E-10, indicative of _PC_64 precision. And checkMathCoprocessorStatus always
displays "was 0x0" and "is now 0x0", indicative of _PC64 + _RC_NEAR (based on VC++ values).

Notes:

1. When I recompiled testdll.c, I changed sprintf(...,old87Status,old87Control,new87result) to sprintf(...,old87Status,old87Control &
mask,new87result & mask), where mask is _MCW_PC + _MCW_RC. In other words, checkMathCoprocessorStatus only displays the "PC" and "RC"
states.)

2. Also, I commented out the call to capsmn in InitTestDLL, since I did not know how to link to it. (I am new to VC++.)

Any conclusion on my part would be just wild speculation.

The call to capsmn is my other test dll, testdll1.dll.
It is written in a mixture of F77, C and C++.

I am beginning to think that I have a computational
problem in my big DLL that Excel is not causing but
exacerbating. I need to check this out.

I am also moving my business on Saturday so I will
be going offline for a few days.

Thanks,
Lynn
 
L

Lynn McGuire

Done. No better calcs.

OK. There is something odd here. The link map for your DLL suggests that the code includes the FPU emulator emu387.lib code as well
(which AFAIR was obsoleted more than a decade ago). It is entirely possible that the 387 emulator cuts enough corners that it cannot
compute the FPU test sum right even when asked to do so.

Incidentally your compiler has optimised *at compile time* the original division that you think you are testing to a multiplication
by the reciprocal. It is a technique called strength reduction.

That is (X/Y)*Y - X was compiled as X * (1/Y) * Y - X
where the value (1/Y) was computed by the compiler and not by the FPU!
(almost any modern optimising compiler will do this because at runtime an FP division is significantly more expensive than a multiply)

MS Visual C goes further an spots that the result is identically zero. Optimiser on it compiles to FLDZ.

What you are seeing here could well be a defect in how the *compiler* optimises expressions since it will quite likely use a 64bit
mantissa here. I now think that is the root cause of your "problem".
The Math Coprocessor status is 0x4020 and control is 0x137f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f
The Math Coprocessor status is 0x4000 and control is 0x127f
The Math Coprocessor status is is now 0x127f

OK. The math coprocessor is now in the right state. With this change does your fluid mechanics code now give the right expected answers?
I call the following code from each entry point.

unsigned checkMathCoprocessorStatus ()
{
unsigned old87Status = 0;
unsigned old87Control = 0;
unsigned new87result = 0;

old87Status = _status87 ();
old87Control = _control87 (0, 0);
//if (old87Status != 0)
{
char msg [4096];
sprintf (msg, "\nThe Math Coprocessor status is 0x%x and control is
0x%x\n\n", old87Status, old87Control);
writeLineToScreen (msg);
// new87result = _control87 (_PC_64 + _RC_NEAR, _MCW_PC + _MCW_RC);
// new87result = _control87 (_PC_53, _MCW_PC);
// new87result = _control87 (_PC_64, _MCW_PC);
// new87result = _control87 (_RC_NEAR, _MCW_RC);
// _asm { FINIT };
//new87result = _control87 (0x27f, 0xffff);
//new87result = _control87 (0, 0);
new87result = _control87 (_PC_53 + _RC_NEAR, _MCW_PC + _MCW_RC);
sprintf (msg, "\nThe Math Coprocessor status is is now 0x%x\n\n",
new87result);
writeLineToScreen (msg);
}

return old87Status;
}
eg

void checkMathCoprocessorStatus (unsigned &old87Status, unsigned &old87Control)
{
old87Status = _status87 ();
old87Control = _control87 (0, 0);
}

unsigned setMathCoprocessorStatus (unsigned value, unsigned mask)
{
unsigned old87Control = _control87 (value, mask);
return old87Control
}

To be able to help further strip out every last thing that isn't absolutely essential from the code and make all routines
individually callable from VBA. Also disable all compiler optimisations and tell it to assume an FPU is fitted (this alone may fix
your problem once the right rounding mode is enabled and that seems to have worked OK).

Also make the value "Y" used in the (X/Y)*Y-X expression parameters to the calling routine - that way the compiler cannot pre compute
(1/Y)

Very impressive analysis !

Wow, I did not know that Open Watcom was
reorganizing my floating point code for me even in
debug mode. That sucks. However, we started using
the Watcom compilers back in 1994 or so because the
floating point speed was so outstanding. All the
games were compiled with it for the same reason.

No, I still have slightly bad results from my big
DLL. The problem that I am really having is that
the results are varying depending on the last
results. Not good and obvious to the user even if
the varying is only by 1.0e-5 (10 ppm).

I am beginning to think that I have a computational
problem in my big DLL that Excel is not causing but
exacerbating. I need to check this out.

I am also moving my business on Saturday so I will
be going offline for a few days.

Thanks,
Lynn
 
M

Martin Brown

I forgot to address this point. I'm really pressed for time....

The result of (x/y)*y-x with the specific chptst constants will be
exactly zero when evaluated __consistently__ in _PC_53 and _PC_64 modes.

That is a coincidence. There are values of x and y for which the result
is not zero in either mode, not always at the same time.

It also works out exactly in all three precision modes 24, 53, 64 bits
mantissa provided that round to nearest is enabled.
The operative word is "consistently". And I am only talking about the
one-line expression (x/y)*y-x.

Lynn's implementation is not consistent, assuming no "subexpression
elimination" optimization by the compiler.

Indeed.

The store back to memory of x/y *and* the compilers precomputation at
compile time of (1/y) to full precision makes the test meaningless.

It is a shame that no modern compilers implement TEMPREAL or REAL*10 for
those of us that do know what we are doing. ISTR Mickeysoft withdrew
that feature at around Visual C++ v6 for "better numerical consistency"
- when what they meant was to avoid support calls.

Using 10 byte reals has performance implications for alignment.
He computes x/y and stores it (call it z). Effectively, that is the same
as _PC_53 mode.

Then he computes z*y-x.

When the latter is calculated in _PC_53 mode, the result is zero because
that is the same as (x/y)*y-z in _PC_53 mode.

When the latter is calculated in _PC_64 mode, the result is non-zero
because we have mixed modes: _PC_53 rounding for z, but _PC_64 for the
rest of the computation.

In contrast, if Lynn had written (x/y)*y-z in one expression, which is
the correct way to test the FDIV defect, it would result in zero in
_PC_64 mode.

Trouble is that a lot of compilers do know about identities "y/y" and
"x-x" these days although they seldom eliminate in debug mode. I suspect
that your suggested code rearrangement above will "fix" the "apparent"
rounding problem in Watcom too. Though I have a hunch it will compile as
FLD1 ie precomputing the numeric constant "y/y".

If it allows embedded assembly then the following code in VC will do the
test explicitly if inserted before "return chptst"

_asm{
// divtwo = top / bottom;
fld qword ptr [top]
fdiv qword ptr [bottom]
// chptst = (divtwo * bottom) - top;
fmul qword ptr [bottom]
fsub qword ptr [top]
fstp qword ptr [chptst]
}

This gives total control of the instructions used and leaves all
intermediate results on the stack.
Again, all of this is by coincidence; that is, it is specific to the
constants in the chptst code.

And as we have both pointed out several times the "error" here is
insignificant and not indicative of any problem with the FPU.
 
J

joeu2004

Martin Brown said:
If it allows embedded assembly then the following
code in VC will do the test explicitly if inserted
before "return chptst"

Let's not get side-tracked by methods to make the one example behave
consistently and as expected. AFAIK, that is not the issue that Lynn is
dealing with. (Although he is the final arbiter on that point.)

IIRC, the issue is: Lynn discovered that (some) floating-point computations
(initially not the chptst example) did not return exactly the same result
when called from VBA and when linked into a program. He probably expected
an exact zero, which made the difference "obvious" to him.

I suspect Lynn interpreted that as a defect, and in Googling for an
explanation, he stumbled across the decades-old and long-since fixed FDIV
defect. When Lynn implemented the FDIV example (incorrectly), he discovered
that it did indeed return zero in his program, but not exactly zero (albeit
infinitesimally close) when called from VBA.

Incorrectly thinking that this explained the original disparity that he
encountered, Lynn became fixated on this one example.

But my point is: any example would do. So there is no point in taking
draconian steps to make the one example -- the chptst computation -- work.

Moreover, although it has been interesting to explore the behavoir of the
FPU control word with respect to VBA v. Watcom and VC++, I believe that is
not the right solution for Lynn.

Instead of setting the FPCW precision mode, Lynn should recognize that such
floating-point inconsistencies are common place. For example, it is not
unusual to experience them when doing "the same" calculation in Excel and in
VBA.

Again, the solution is to explicitly round the results of calculation when
you expect results to be consistent to a specific precision.

Alternatively, design algorithms to tolerate infinitesimal differences in
floating-point calculations. For example, instead of test A=B, test
ABS(A-B)<e, where "e" is some difference that reflects your tolerance for
error, e.g. 0.005.
 
L

Lynn McGuire

Let's not get side-tracked by methods to make the one example behave consistently and as expected. AFAIK, that is not the issue that
Lynn is dealing with. (Although he is the final arbiter on that point.)

IIRC, the issue is: Lynn discovered that (some) floating-point computations (initially not the chptst example) did not return exactly
the same result when called from VBA and when linked into a program. He probably expected an exact zero, which made the difference
"obvious" to him.

I suspect Lynn interpreted that as a defect, and in Googling for an explanation, he stumbled across the decades-old and long-since
fixed FDIV defect. When Lynn implemented the FDIV example (incorrectly), he discovered that it did indeed return zero in his program,
but not exactly zero (albeit infinitesimally close) when called from VBA.

Incorrectly thinking that this explained the original disparity that he encountered, Lynn became fixated on this one example.

But my point is: any example would do. So there is no point in taking draconian steps to make the one example -- the chptst
computation -- work.

Moreover, although it has been interesting to explore the behavoir of the FPU control word with respect to VBA v. Watcom and VC++, I
believe that is not the right solution for Lynn.

Instead of setting the FPCW precision mode, Lynn should recognize that such floating-point inconsistencies are common place. For
example, it is not unusual to experience them when doing "the same" calculation in Excel and in VBA.

Again, the solution is to explicitly round the results of calculation when you expect results to be consistent to a specific precision.

Alternatively, design algorithms to tolerate infinitesimal differences in floating-point calculations. For example, instead of test
A=B, test ABS(A-B)<e, where "e" is some difference that reflects your tolerance for error, e.g. 0.005.

Nope, I have had the FDIV check in our software
since 1995. We had several customers all of
sudden reporting bad results with their new PCs.
Here is the comment that I put in the code for
that subroutine:

C 03/27/95 Lynn McGuire add pentium math coprocessor test

That test used to work just fine. I suspect that
the Watcom floating point code optimization has
gotten better since 1995. Since I don't have a
bad FPU to check this code with anymore, I
probably need to pull it out of our code.

Lynn
 
J

joeu2004

Lynn McGuire said:
Nope, I have had the FDIV check in our software
since 1995.

Okay. My bad! In that case, Martin's assembly-language implementation is
just the thing for you. Your original test design was incorrect; it worked
only by coincidence.
 
M

Martin Brown

Okay. My bad! In that case, Martin's assembly-language implementation is
just the thing for you. Your original test design was incorrect; it
worked only by coincidence.

Strictly it worked OK in a standard Fortran REAL*8 aka 53bit mantissa
environment provided that the compiler doesn't get too clever. The store
to an 8 byte real is harmless there.

However, everything breaks if the arithmetic 64bit mantissa but only 53
bits stored the result ends up different in the least significant bit.

This should not matter at all unless your algorithm is extremely
sensitive to roundoff - you still have almost 16 good decimal digits.

All decent commercial compilers are smart enough to optimise divide by a
constant to multiple by a reciprocal these days. So the test is not
actually testing the divide instruction at all any more.
 
J

joeu2004

Martin Brown said:
provided that the compiler doesn't get too clever

__Of_course__, all of my comments assume that compile-time optimizations are
avoided.

That goes without saying, if our intent is to test the FDIV defect or the
effect of difference precision modes (_PC_64, _PC_53, _PC_24).

(Although it was useful to Lynn that you discovered that his compiler is
optimizing the code.)


Martin Brown said:
Your original test design was incorrect;
it worked only by coincidence.

Strictly it worked OK in a standard Fortran REAL*8
aka 53bit mantissa environment [...]. The store to
an 8 byte real is harmless there.

However, everything breaks if the arithmetic 64bit
mantissa but only 53 bits stored the result ends up
different in the least significant bit.

When I said that Lynn's original chptst implementation was incorrect, I was
referring to the rounding of the __intermediate__ result (only the division)
to 64-bit representation (53-bit mantissa)

For the FDIV defect, if x is 4195835 and y is 3145727, then (x/y)*y-x is
exactly zero whether all calculation is done __consistently__ using 64-bit
or using 80-bit representation (the latter with a 64-bit mantissas). Of
course, the final result is stored with 64-bit representation.

(And again, __of_course__ I am only talking about the case when compile-time
optimizations are avoided.)

But instead of a single expression, Lynn [*] implemented this as div = x/y,
then chptst = div*y-x.

That worked (i.e. chptst is exactly zero) for him in the past only because
apparently he was using 64-bit representation __consistently__.

It failed to work (i.e. chptst is not exactly zero) when Excel VBA used
80-bit representation, but only because Lynn is not using 80-bit
representation __consistently__.

Indeed, it is the store into div, with only 53-bit precision, that causes
the problem. It is not "harmless" in this particular case.

That is what I meant when I said it worked (in the past) only by
coincidence. The "coincidence" was that he was using __consistent__
precision in the past, namely 64-bit representation (53-bit precision).



Martin Brown said:
This should not matter at all unless your algorithm
is extremely sensitive to roundoff - you still have
almost 16 good decimal digits.

It is not clear to me what your point is.

But in a previous response, I alluded to the fact that _in_general__, we
cannot expect (x/y)*y-x to be exactly zero, even when we use __consistent__
precision for the calculation.

(And again, __of_course__ I am only talking about the case when compile-time
optimizations are avoided.)

Here are some random examples, all using x and y with the same absolute and
relative magnitudes as Nicely's example for the FDIV defect. By that, I
mean: x and y are 7-digit numbers; and 1 < INT(x/y) < 2.

80bit 64bit mixMode x y
=0 =0 <>0 4195835 3145727 (FDIV bug)
=0 =0 =0 1300666 1233646
=0 <>0 <>0 1695563 1538366
=0 <>0 =0 none
<>0 =0 <>0 1923832 1204810
<>0 =0 =0 none
<>0 <>0 <>0 1867447 1462980
<>0 <>0 =0 none


-----
[*] I don't know if Lynn invented the div/chptst implementation or he copied
it from some reference. When I asked him for a reference, he pointed me
only to the wiki FDIV bug page,
http://en.wikipedia.org/wiki/Pentium_FDIV_bug.

I do not see Lynn's div/chptst implementation on that page or on any
referenced page.

On the contrary, I do find Nicely's original pentbug.c file, which contains
a __consistent__ implementation using 64-bit representation. Excerpted:

double lfNum1=4195835.0, lfNum2=4195835.0, lfDenom1=3145727.0,
lfDenom2=3145727.0, lfQuot, lfProd, lfZero;
/* The duplicate variables are used in an effort to foil compiler
optimization and compile-time evaluation of numeric literals. */
lfQuot=lfNum1/lfDenom1;
lfProd=lfDenom2*lfQuot;
lfZero=lfNum2 - lfProd;

Aside #1.... Note that Nicely went out of his way to avoid compile-time
optimizations. Or so he hoped ;-).

If Lynn had followed Nicely's implementation correctly, he wouldn't have had
any problem with the FDIV constants despite the change in precision mode
between Watcom languages and Excel VBA and despite the Watcom compile-time
optimizations.

Aside #2.... IMHO, Nicely was wrong to use 64-bit representation to
demonstrate the FDIV bug. Luckily, it worked, but only because the FDIV
error was so large, about 8E-5. He might have overlooked more subtle errors
due to rounding the intermediate 80-bit results to 64-bit representation.
 
J

joeu2004

Errata.... I said:
If Lynn had followed Nicely's implementation
correctly, he wouldn't have had any problem
with the FDIV constants despite the change in
precision mode between Watcom languages and
Excel VBA

But perhaps Nicely's implementation or something nearly identical was not
available when Lynn wrote his(?) algorithm.

The pentbug.c file that I find today has only a 2011 copyright. But it
presumably was written in or before 2003, since that is the date attributed
to the pentbug.zip file on Nicely's website.

If the algorithm had been made public at some earlier date (e.g. 1995), I
would think that Nicely would know that the earlier date(s) of "publication"
should be included in the copyright.


PS.... I said:
IMHO, Nicely was wrong to use 64-bit representation
to demonstrate the FDIV bug. Luckily, it worked,
but only because the FDIV error was so large, about
8E-5. He might have overlooked more subtle errors
due to rounding the intermediate 80-bit results to
64-bit representation.

I meant to add.... I also think that it was "wrong" to use (x/y)*y-x to
demonstrate the FDIV bug in the first place.

Well, I do agree that it had "marketing appeal" insofar as it might have
made the defect more relevant to some people.

But my point is.... IMHO, that is not the correct way to test for the
presence of the defect.

Actually, it is Tim Coe who discovered the error with 4195835.0/3145727.0
per se. It represents the largest known error among the many defective FDIV
results.

And it does appear that Nicely or Coe did look at the result of the division
using 80-bit arithmetic, at least eventually.

Nicely's published results of the division are (the comma shows 15
significant digits to the left):

1.33382044913624,10025 (Correct value)
1.33373906890203,75894 (Flawed Pentium)

Note that the representation of the "correct value" is more precise than the
64-bit representation can be. The 64-bit representation is exactly
1.33382044913624,10930577198087121360003948211669921875, somewhat greater
than 1.333...,10025. The next closest 64-bit representation is exactly
1.33382044913624,0871013114883680827915668487548828125, which is too low.

Based on the information available today, I would implement the FDIV bug
test as follows.

#include <math.h>

double ident(double x) { return x; } // thwart compiler optimizations

int fdivTest() // 1 = okay; 0 = error
{
const double num = 4195835;
const double denom = 3145727;
const double quot = 1.33382044913624;
return (fabs(ident(num)/ident(denom) - quot) < 5e-15);
}

It does not matter whether that is evaluated consistently in 64-bit (53-bit
mantissa) or 80-bit (64-bit mantissa) representation or a mix.

FYI, we cannot expect the division result to have the same 64-bit binary
representation as the constant 1.33382044913624.

In fact, the two closest representations of the constant are
1.33382044913623,9982834695183555595576763153076171875 and
1.33382044913624,02048793001085869036614894866943359375.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top