exp bug in Virtual Pascal and bpl70v20

Discussion:

(too old to reply)

Wolfgang Ehrhardt

2011-01-15 16:54:01 UTC

Virtual Pascal and Borland Pascal with Robert Prins' bpl70v20 have an
enhanced exp routine based on a code snippet from Norbert Juffa. This
is more accurate than Borland's standard Pascal/Delphi exp or older VP
routines.

The key idea is to calculate exp(x) = 2^(log2(e)*x - 1) + 1 with extra
precision for frac(log2(e)*x) and then use the x87 instruction f2xm1
to compute 2^frac(log2(e)*x)-1. f2xm1 requires that the abs value of
its argument must be <= 1.

Unfortunately the Juffa's code miserably fails for some x values if
the "round up toward +infinity" is used. Example:

program t_expbug;
{$N+}
const
rmUp: word = $1B72;
var
x,y: extended;
begin
asm
fldcw [rmUp]
end;
x := 20.0 + 0.7944154167983592825;
y := exp(x);
writeln(x:18:14, y:27, 1073741824.0:27);
x := 2*x;
y := exp(x);
writeln(x:18:14, y:27, 1152921504606846976.0:27);
end.

The output is for both Virtual Pascal 2.1.279 and BP7/bpl70v20:

20.79441541679836 -2.32830643653869629E-10 1.07374182400000000E+09
41.58883083359672 -5.00000000000000000E-01 1.15292150460684698E+18

A fix is to test frac() against 1 and correct. I guess this is
faster than temporarily change the rounding control to "round to
nearest".

Here is a corrected version (suppressing most comments in order to
display properly in newsreaders)

const
ln2_hi: array[0..7] of byte = ($00,$00,$E0,$FE,$42,$2E,$E6,$3F);
ln2_lo: array[0..7] of byte = ($76,$3C,$79,$35,$EF,$39,$EA,$3D);

function exp(x: extended): extended;assembler;{&Frame-}{&Uses none}
{-Accurate and debugged exp}
asm
{Based on Norbert Juffa's exp}
fld [x]
fldl2e
fmul st,st(1)
frndint
fld qword ptr [ln2_hi]
fmul st,st(1)
fsubp st(2),st
fld qword ptr [ln2_lo]
fmul st, st(1)
fsubp st(2),st
fxch st(1)
fldl2e
fmulp st(1),st

{It may happen (especially for rounding modes other than
"round to nearest") that |frac(z)| > 1. In this case the
result of f2xm1 is undefined. The next lines will test
frac(z) and truncate it to +1 or -1 if necessary.}

fld st
fabs
fld1
fcompp
fstsw ax
sahf
jae @@1
fldz
fcompp
sahf
fld1
jae @@1
fchs
@@1:

f2xm1
fld1
faddp st(1),st
fscale
fstp st(1)
fwait
end;

Any comments or improvements?
Wolfgang

--
Do not use the e-mail address from the header! You can get a
valid address from http://home.netsurf.de/wolfgang.ehrhardt
(Free open source Crypto, CRC/Hash, MPArith for Pascal/Delphi)

Robert AH Prins

2011-01-16 00:37:37 UTC

Permalink

Post by Wolfgang Ehrhardt
Virtual Pascal and Borland Pascal with Robert Prins' bpl70v20 have an
enhanced exp routine based on a code snippet from Norbert Juffa. This
is more accurate than Borland's standard Pascal/Delphi exp or older VP
routines.
The key idea is to calculate exp(x) = 2^(log2(e)*x - 1) + 1 with extra
precision for frac(log2(e)*x) and then use the x87 instruction f2xm1
to compute 2^frac(log2(e)*x)-1. f2xm1 requires that the abs value of
its argument must be<= 1.
Unfortunately the Juffa's code miserably fails for some x values if
program t_expbug;
{$N+}
const
rmUp: word = $1B72;
var
x,y: extended;
begin
asm
fldcw [rmUp]
end;
x := 20.0 + 0.7944154167983592825;
y := exp(x);
writeln(x:18:14, y:27, 1073741824.0:27);
x := 2*x;
y := exp(x);
writeln(x:18:14, y:27, 1152921504606846976.0:27);
end.
20.79441541679836 -2.32830643653869629E-10 1.07374182400000000E+09
41.58883083359672 -5.00000000000000000E-01 1.15292150460684698E+18
A fix is to test frac() against 1 and correct. I guess this is
faster than temporarily change the rounding control to "round to
nearest".
Here is a corrected version (suppressing most comments in order to
display properly in newsreaders)
const
ln2_hi: array[0..7] of byte = ($00,$00,$E0,$FE,$42,$2E,$E6,$3F);
ln2_lo: array[0..7] of byte = ($76,$3C,$79,$35,$EF,$39,$EA,$3D);
function exp(x: extended): extended;assembler;{&Frame-}{&Uses none}
{-Accurate and debugged exp}
asm
{Based on Norbert Juffa's exp}
fld [x]
fldl2e
fmul st,st(1)
frndint
fld qword ptr [ln2_hi]
fmul st,st(1)
fsubp st(2),st
fld qword ptr [ln2_lo]
fmul st, st(1)
fsubp st(2),st
fxch st(1)
fldl2e
fmulp st(1),st
{It may happen (especially for rounding modes other than
"round to nearest") that |frac(z)|> 1. In this case the
result of f2xm1 is undefined. The next lines will test
frac(z) and truncate it to +1 or -1 if necessary.}
fld st
fabs
fld1
fcompp
fstsw ax
sahf
fldz
fcompp
sahf
fld1
fchs
f2xm1
fld1
faddp st(1),st
fscale
fstp st(1)
fwait
end;
Any comments or improvements?

I've forwarded your code to the grandmaster of everything FPU, let's
wait and see.

Robert

--
Robert AH Prins
spamtrap(a)prino(d)org

Robert AH Prins

2011-01-16 02:29:45 UTC

Permalink

Post by Robert AH Prins

I've forwarded your code to the grandmaster of everything FPU, let's
wait and see.

His initial reply:

Robert:

Math library functions targetting x87 were typically designed with the
assumption that the dynamic rounding control of the x87 FPU at entry is
set to round-to-nearest-or-even. That is certainly the specification to
which I designed this code, so the results observed are a consequence of
operating my code outside its design specifications.

Does documentation for Borland Pascal (or Virtual Pascal) state anywhere
that common math library routines are guaranteed to work correctly if a
user changes the x87 rounding mode away from the default setting, prior
to invoking such a function? If so, what is the stated definition of the
correct behavior in such a case?
For example, the specification may state that the result returned must
be the same for all rounding modes, which may require each function to
use a wrapper that saves the current rounding mode, sets it to round to
nearest, then restores it at the end. Obviously this would be slow.

I will take a quick look to see if there is a small modification to the
code that gets results closer to what the users expects, but no promises.

I'll get back to you by Sunday night.

-- Norbert

--
Robert AH Prins
spamtrap(a)prino(d)org

Robert AH Prins

2011-01-16 20:13:26 UTC

Permalink

Post by Robert AH Prins
Math library functions targetting x87 were typically designed with the
assumption that the dynamic rounding control of the x87 FPU at entry is
set to round-to-nearest-or-even. That is certainly the specification to
which I designed this code, so the results observed are a consequence of
operating my code outside its design specifications.

Here is a quote from the Virtual Pascal bug list found at
[quote]...
In the old days, F2XM1 was limited to arguments in
[-0.5, 0.5]. But since the 387, F2XM1 can accecpt
arguments in [-1, 1].
So, we can split the argument into an integer and a
fraction part using FRNDINT and the fraction part
will always be -1<= f<= 1 no matter what rounding
control. This means we don't have to load/restore the
FPU control word (CW) which is slow on modern
OOO FPUs (since FLDCW is a serializing instruction).
....
-- Norbert
[/quote]
So at least all rounding modes are supposed to work. But I see no
way to PROVE that even for "round to nearest" the calculated
fraction part will ALWAYS be in [-1, 1].
Up do now, I have only found errors (more correct: negative exp
values) for the rmUp rouding mode for arguments near x = k*ln(2)*2^e
with (k=9,11,18,22,30,33,36,41,44,47,60,63) and values -2<= e<= 10.

Post by Robert AH Prins
Does documentation for Borland Pascal (or Virtual Pascal) state anywhere
that common math library routines are guaranteed to work correctly if a
user changes the x87 rounding mode away from the default setting, prior
to invoking such a function? If so, what is the stated definition of the
correct behavior in such a case?
For example, the specification may state that the result returned must
be the same for all rounding modes, which may require each function to
use a wrapper that saves the current rounding mode, sets it to round to
nearest, then restores it at the end. Obviously this would be slow.

I think, the RTLs of Borland (Pascal and Delphi) or Virtual Pascal are
very badly designed for the math functions. Most (all?) of them return
extended result but very few (with the possible exceptions sqrt and
ln, because these can be directly mapped to the FPU instructions)
return correctly rounded results. A better designed RTL would return
correctly rounded double results (round to nearest).
Other rounding modes could have relaxed accuracy requirements but
they should to return totally wrong results.

Two more replies from Norbert Juffa:

=== 1 ===
So far I am unable to reproduce this problem. I don't have a working
Pascal installation anymore, so I tried with inline assembler inside
Microsoft C. Maybe I don't have quite the right input operand. What is
the byte pattern (10 bytes) for x (i.e. 20.0 + 0.7944154167983592825)
that Pascal produces ?

From code inspection, it's not quite clear to me how we could wind up
with a result that is too large (i.e. > 1.0 prior to F2XM1). If the
rounding mode is "up", each of the FMUL products with ln2 should be
slightly too large, and subtracting that from the input should result in
a fraction that is too small, not too large. At least if all operands
are positive. Having a repro case might at least lead to better
understanding as to what is going on.

I doubt that there is an easy way to make the code functional regardless
of rounding mode; certainly analysis would be very tricky as all
intermediate operations are subject to the direct rounding mode. As I
said the code was not designed with directed rounding in mind, as I
recall I designed the code for use with Digital Fortran originally.

=== 2 ===
I got repro. The failing input is 4003_A65AF678_54B28211. With rounding
to positive infinity, multiplying the input by L2E gives 30.0 + 1 ulp.
FRNDINT then rounds this up further to 31.0. Things are already
irreparably screwed up at this point. In any event, the input to F2XM1
becomes -(1.0 + 1 ulp), which is clearly out of bounds. On my CPU it
returns the unmodified input operand in this case.

The only reasonable workaround for this that I can see is to reduce the
input to F2XM1 further, to make sure we are definitely inside the
interval supported by the hardware. In other words, replace F2MX1 by the
following instruction sequence:

fmul dword ptr [half] ; 0.5*frac(z) int(z)
f2mx1 ; 2^(0.5*frac(z))-1 int(z)
fld st(0) ; 2^(0.5*frac(z))-1 2^(0.5*frac(z))-1 int(z)
fadd dword ptr [two] ; 2^(0.5*frac(z))+1 2^(0.5*frac(z))-1 int(z)
fmulp st(1),st ; 2^frac(z)-1 int(z)

If there is consensus that Pascal math functions shall work properly
when the dynamic rounding mode is not round-to-nearest-even, I believe
this is the way to go (least impact on execution time).

I dug out my old Pascal manuals and can't find anything about the
prescribed behavior of math functions when a user dials in some rounding
mode different from the default. Given the naive approach to exp()
computation in Borland's original library, it is quite possible that it
happened to work as desired in round-up mode. My question is: is that to
be considered an implementation artifact or a feature?
=== ===

I guess the above is the best than can be done, if you really need
non-default rounding.

Thanks Norbert!

Robert

--
Robert AH Prins
spamtrap(a)prino(d)org

Wolfgang Ehrhardt

2011-01-16 22:28:00 UTC

Permalink

On Sun, 16 Jan 2011 20:13:26 +0000, Robert AH Prins
...

Post by Robert AH Prins
I got repro. The failing input is 4003_A65AF678_54B28211. With rounding
to positive infinity, multiplying the input by L2E gives 30.0 + 1 ulp.
FRNDINT then rounds this up further to 31.0. Things are already
irreparably screwed up at this point. In any event, the input to F2XM1
becomes -(1.0 + 1 ulp), which is clearly out of bounds. On my CPU it
returns the unmodified input operand in this case.

Meanwhile I found that the unmodified code has a lot more errors for
rmTruncate with negative x arguments: here are the smallest values
found (also with hex dumps for better reference), the positive
arguments give wrong results with rmUp the negative with rmTruncate.

x exp(x) x as LSB Hex
-6.238324625040 -1.27054942088145051E-0021 C001C7A05AF6CC0968E2
-7.624618986159 -3.17637355220362627E-0022 C001F3FCE0F4C07D474D
-8.317766166719 -5.29395592033937712E-0023 C002851591F9DD5B9B41
-9.010913347279 -7.94093388050906568E-0023 C002902CB3795A7892DC

6.238324625040 -1.11022302462515654E-0016 4001C7A05AF6CC0968E1
7.624618986159 -4.44089209850062616E-0016 4001F3FCE0F4C07D474C
12.476649250079 -1.13686837721616030E-0013 4002C7A05AF6CC0968E1
15.249237972319 -1.81898940354585648E-0012 4002F3FCE0F4C07D474C

Post by Robert AH Prins
The only reasonable workaround for this that I can see is to reduce the
input to F2XM1 further, to make sure we are definitely inside the
interval supported by the hardware. In other words, replace F2MX1 by the
fmul dword ptr [half] ; 0.5*frac(z) int(z)
f2mx1 ; 2^(0.5*frac(z))-1 int(z)
fld st(0) ; 2^(0.5*frac(z))-1 2^(0.5*frac(z))-1 int(z)
fadd dword ptr [two] ; 2^(0.5*frac(z))+1 2^(0.5*frac(z))-1 int(z)
fmulp st(1),st ; 2^frac(z)-1 int(z)

This code does not show the errors and seems to work ok (Note that
there is a small type f2mx1 must be changed to f2xm1; and the
constants should be single). The priced to pay is a small decrease in
accuracy, here the result for 20000 random arguments with standard
round to nearest (the error is calculated with my MPArith routines):

------ Old version ------
Test of amath.exp
at 10000 random values in [-100.0000000000 .. 100.0000000000]
RMS = 0.24, max rel = 0.71 eps at
x(ext) = 7.24700653776128467E+0001 = $400590F0AC68BFA87B80
y(ext) = 2.97405843003663398E+0031 = $4067BBB0815EA0230490
y(mpf) = +2.97405843003663397839071382253654488039888365036E+31

Test of amath.exp
at 10000 random values in [100.0000000000 .. 10000.0000000000]
RMS = 0.24, max rel = 0.72 eps at
x(ext) = 4.80456952900561126E+0003 = $400B96248E653929CA47
y(ext) = 3.96309394729273526E+2086 = $5B12B8A5DD42BAF8E736
y(mpf) = +3.96309394729273526538566268155478642708748950161E+2086

------ New version ------
Test of amath.exp
at 10000 random values in [-100.0000000000 .. 100.0000000000]
RMS = 0.25, max rel = 0.83 eps at
x(ext) = 1.83770128884148676E+0001 = $400393041F554F4C8000
y(ext) = 9.57271857222288938E+0007 = $4019B695CA371C7FC4DE
y(mpf) = +9.57271857222288937714622943945524368866640537394E+7

Test of amath.exp
at 10000 random values in [100.0000000000 .. 10000.0000000000]
RMS = 0.24, max rel = 0.83 eps at
x(ext) = 5.98844546289204336E+0003 = $400BBB23904ED947432E
y(ext) = 5.60815118133691577E+2600 = $61BEB51753447EF41B4A
y(mpf) = +5.60815118133691576592733747663153598713873826468E+2600

Post by Robert AH Prins
If there is consensus that Pascal math functions shall work properly
when the dynamic rounding mode is not round-to-nearest-even, I believe
this is the way to go (least impact on execution time).
I dug out my old Pascal manuals and can't find anything about the
prescribed behavior of math functions when a user dials in some rounding
mode different from the default. Given the naive approach to exp()
computation in Borland's original library, it is quite possible that it
happened to work as desired in round-up mode. My question is: is that to
be considered an implementation artifact or a feature?

I guess it is a pure accident, especially as Borland Pascal has no
library functions for getting/setting rounding modes. The situation is
a bit different for Virtual Pascal and Delphi. But IMO at least the
Delphi math unit designer/implementers did not understand there task.
My AMath unit is an attempt to overcome these Delphi shortcomings.

Robert: Please forward a big "Thank you" to Norbert.

Wolfgang

--
Do not use the e-mail address from the header! You can get a
valid address from http://home.netsurf.de/wolfgang.ehrhardt
(Free open source Crypto, CRC/Hash, MPArith for Pascal/Delphi)

Dr J R Stockton

2011-01-17 22:39:37 UTC

Permalink

Post by Robert AH Prins
So far I am unable to reproduce this problem. I don't have a working
Pascal installation anymore, so I tried with inline assembler inside
Microsoft C. Maybe I don't have quite the right input operand. What is
the byte pattern (10 bytes) for x (i.e. 20.0 + 0.7944154167983592825)
that Pascal produces ?

I have JavaScript forms to answer that FOR SINGLE AND DOUBLE ONLY at
<http://www.merlyn.demon.co.uk/js-exact.htm>. I could think about
extending them to EXTENDED, but, as I didn't do it at the time, there
may be a difficulty - yes, there is.

A Web search for John Herbster (JohnH) might locate something useful.

Come to think of it, try in
<http://www.merlyn.demon.co.uk/programs/00index.htm> which lists and
links
floatval.pas floatval.exe (BP7 DOS mode)
sixbytes.pas
tenbytes.pas
though I don't recall what they do (but I can compile them).

OTOH, maybe you only need a compiler and a cast to array of byte.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
<http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
<http://www.merlyn.demon.co.uk/clpb-faq.txt> RAH Prins : c.l.p.b mFAQ;
<ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.