Skip to content

Commit c8a2a00

Browse files
authored
fix several bugs and formatting in the fast math ulp tables (#544)
* fix several bugs and formatting in the fast math ulp tables * add back OpFDiv to environment spec description unify formatting
1 parent a5a71be commit c8a2a00

2 files changed

Lines changed: 218 additions & 164 deletions

File tree

OpenCL_C.txt

Lines changed: 105 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -11301,7 +11301,7 @@ there was sufficient range, meets ULP error tolerance.
1130111301
| _x_ + _y_ | Correctly rounded
1130211302
| _x_ - _y_ | Correctly rounded
1130311303
| _x_ * _y_ | Correctly rounded
11304-
| *1.0 / _x_* | {leq} 2.5 ulp
11304+
| 1.0 / _x_ | {leq} 2.5 ulp
1130511305
| _x_ / _y_ | {leq} 2.5 ulp
1130611306
| |
1130711307
| *acos* | {leq} 4 ulp
@@ -11487,7 +11487,7 @@ is the infinitely precise result.
1148711487
| _x_ + _y_ | Correctly rounded
1148811488
| _x_ - _y_ | Correctly rounded
1148911489
| _x_ * _y_ | Correctly rounded
11490-
| *1.0 / _x_* | {leq} 3 ulp
11490+
| 1.0 / _x_ | {leq} 3 ulp
1149111491
| _x_ / _y_ | {leq} 3 ulp
1149211492
| |
1149311493
| *acos* | {leq} 4 ulp
@@ -11625,136 +11625,188 @@ requires>> support for OpenCL C 2.0 or newer.
1162511625

1162611626
[[table-float-ulp-relaxed]]
1162711627
.ULP values for single precision built-in math functions with unsafe math optimizations in the full and embedded profiles
11628-
[cols=",",]
11628+
[cols="3,7",]
1162911629
|====
11630-
| *Function* | *Min Accuracy - ULP values*
11631-
| *1.0 / _x_*
11630+
| *Function*
11631+
| *Minimum Accuracy*
11632+
11633+
| 1.0 / _x_
1163211634
| {leq} 2.5 ulp for _x_ in the domain of 2^-126^ to 2^126^ for the full
1163311635
profile, and {leq} 3 ulp for the embedded profile.
11636+
1163411637
| _x_ / _y_
1163511638
| {leq} 2.5 ulp for _x_ in the domain of 2^-62^ to 2^62^ and _y_ in the
1163611639
domain of 2^-62^ to 2^62^ for the full profile, and {leq} 3 ulp for
1163711640
the embedded profile.
11641+
1163811642
| *acos*(_x_)
1163911643
| {leq} 4096 ulp
11644+
11645+
| *acosh*(_x_)
11646+
| Derived implementations may implement as *log*(_x_ + *sqrt*(_x_ * _x_ - 1)).
11647+
1164011648
| *acospi*(_x_)
11641-
| Implemented as *acos*(_x_) * `M_PI_F`.
11649+
| Derived implementations may implement as *acos*(_x_) * `M_PI_F`.
1164211650
For non-derived implementations, the error is {leq} 8192 ulp.
11651+
1164311652
| *asin*(_x_)
1164411653
| {leq} 4096 ulp
11654+
11655+
| *asinh*(_x_)
11656+
| Derived implementations may implement as *log*(_x_ + *sqrt*(_x_ * _x_ + 1)).
11657+
1164511658
| *asinpi*(_x_)
11646-
| Implemented as *asin*(_x_) * `M_PI_F`.
11659+
| Derived implementations may implement as *asin*(_x_) * `M_PI_F`.
1164711660
For non-derived implementations, the error is {leq} 8192 ulp.
11661+
1164811662
| *atan*(_x_)
1164911663
| {leq} 4096 ulp
11650-
| *atan2*(_y_, _x_)
11651-
| Implemented as *atan*(_y_ / _x_) for _x_ > 0, *atan*(_y_ / _x_) +
11652-
`M_1_PI_F` for _x_ < 0 and _y_ > 0 and *atan*(_y_ / _x_) -
11653-
`M_1_PI_F` for _x_ < 0 and _y_ < 0.
11664+
11665+
//| *atanh*(_x_)
11666+
// | Defined for _x_ in the domain (-1, 1).
11667+
// For _x_ in [-2^-10^, 2^-10^], derived implementations may implement as _x_.
11668+
// For _x_ outside of [-2^-10^, 2^-10^], derived implementations may implement as
11669+
// 0.5f * *log*((1.0f + _x_) / (1.0f - _x_)).
11670+
// For non-derived implementations, the error is {leq} 8192 ulp.
11671+
1165411672
| *atanpi*(_x_)
11655-
| Implemented as *atan*(_x_) * `M_1_PI_F`.
11673+
| Derived implementations may implement as *atan*(_x_) * `M_1_PI_F`.
1165611674
For non-derived implementations, the error is {leq} 8192 ulp.
11675+
11676+
| *atan2*(_y_, _x_)
11677+
| Derived implementations may implement as *atan*(_y_ / _x_) for _x_ > 0,
11678+
*atan*(_y_ / _x_) + `M_PI_F` for _x_ < 0 and _y_ > 0, and
11679+
*atan*(_y_ / _x_) - `M_PI_F` for _x_ < 0 and _y_ < 0.
11680+
1165711681
| *atan2pi*(_y_, _x_)
11658-
| Implemented as *atan2*(_y_, _x_) * `M_PI_F`.
11682+
| Derived implementations may implement as *atan2*(_y_, _x_) * `M_1_PI_F`.
1165911683
For non-derived implementations, the error is {leq} 8192 ulp.
11660-
| *acosh*(_x_)
11661-
| Implemented as *log*(_x_ + *sqrt*(_x_ * _x_ - 1)).
11662-
| *asinh*(_x_)
11663-
| Implemented as *log*(_x_ + *sqrt*(_x_ * _x_ + 1)).
11684+
1166411685
| *cbrt*(_x_)
11665-
| Implemented as *rootn*(_x_, 3).
11686+
| Derived implementations may implement as *rootn*(_x_, 3).
1166611687
For non-derived implementations, the error is {leq} 8192 ulp.
11688+
1166711689
| *cos*(_x_)
1166811690
| For _x_ in the domain [-{pi}, {pi}], the maximum absolute error
1166911691
is {leq} 2^-11^ and larger otherwise.
11692+
1167011693
| *cosh*(_x_)
11671-
| Defined for _x_ in the domain [-88,88] and implemented as 0.5f *
11672-
(*exp*(_x_) + *exp*(-_x_)).
11694+
| Defined for _x_ in the domain [-88, 88].
11695+
Derived implementations may implement as 0.5f * (*exp*(_x_) + *exp*(-_x_)).
1167311696
For non-derived implementations, the error is {leq} 8192 ulp.
11697+
1167411698
| *cospi*(_x_)
1167511699
| For _x_ in the domain [-1, 1], the maximum absolute error is {leq}
1167611700
2^-11^ and larger otherwise.
11701+
1167711702
| *exp*(_x_)
1167811703
| {leq} 3 + *floor*(*fabs*(2 * _x_)) ulp for the full profile, and {leq}
1167911704
4 ulp for the embedded profile.
11705+
1168011706
| *exp2*(_x_)
1168111707
| {leq} 3 + *floor*(*fabs*(2 * _x_)) ulp for the full profile, and {leq}
1168211708
4 ulp for the embedded profile.
11709+
1168311710
| *exp10*(_x_)
11684-
| Derived implementations implement this as *exp2*(_x_ * *log2*(10)).
11711+
| Derived implementations may implement as *exp2*(_x_ * *log2*(10)).
1168511712
For non-derived implementations, the error is {leq} 8192 ulp.
11713+
1168611714
| *expm1*(_x_)
11687-
| Derived implementations implement this as *exp*(_x_) - 1.
11715+
| Derived implementations may implement as *exp*(_x_) - 1.
1168811716
For non-derived implementations, the error is {leq} 8192 ulp.
11717+
1168911718
| *log*(_x_)
1169011719
| For _x_ in the domain [0.5, 2] the maximum absolute error is {leq}
11691-
2^-21^; otherwise the maximum error is {leq}3 ulp for the full profile
11692-
and {leq} 4 ulp for the embedded profile
11720+
2^-21^; otherwise the maximum error is {leq} 3 ulp for the full profile
11721+
and {leq} 4 ulp for the embedded profile.
11722+
1169311723
| *log2*(_x_)
1169411724
| For _x_ in the domain [0.5, 2] the maximum absolute error is {leq}
11695-
2^-21^; otherwise the maximum error is {leq}3 ulp for the full profile
11696-
and {leq} 4 ulp for the embedded profile
11725+
2^-21^; otherwise the maximum error is {leq} 3 ulp for the full profile
11726+
and {leq} 4 ulp for the embedded profile.
11727+
1169711728
| *log10*(_x_)
1169811729
| For _x_ in the domain [0.5, 2] the maximum absolute error is {leq}
11699-
2^-21^; otherwise the maximum error is {leq}3 ulp for the full profile
11700-
and {leq} 4 ulp for the embedded profile
11730+
2^-21^; otherwise the maximum error is {leq} 3 ulp for the full profile
11731+
and {leq} 4 ulp for the embedded profile.
11732+
1170111733
| *log1p*(_x_)
11702-
| Derived implementations implement this as *log*(_x_ + 1).
11734+
| Derived implementations may implement as *log*(_x_ + 1).
1170311735
For non-derived implementations, the error is {leq} 8192 ulp.
11736+
1170411737
| *pow*(_x_, _y_)
1170511738
| Undefined for _x_ = 0 and _y_ = 0.
11706-
Undefined for _x_ < 0 and non-integer y.
11707-
Undefined for _x_ < 0 and _y_ outside the domain [-2^24, 2^24].
11708-
For _x_ > 0 or _x_ < 0 and even _y_, derived implementations implement
11709-
this as *exp2*(_y_ * *log2*(*fabs*(_x_))).
11710-
For _x_ < 0 and odd _y_, derived implementations implement this as
11739+
Undefined for _x_ < 0 and non-integer _y_.
11740+
Undefined for _x_ < 0 and _y_ outside the domain [-2^24^, 2^24^].
11741+
For _x_ > 0 or _x_ < 0 and even _y_, derived implementations may implement as
11742+
*exp2*(_y_ * *log2*(*fabs*(_x_))).
11743+
For _x_ < 0 and odd _y_, derived implementations may implement as
1171111744
-*exp2*(_y_ * *log2*(*fabs*(_x_)).
11712-
For _x_ == 0 and nonzero _y_, derived implementations return zero.
11745+
For _x_ == 0 and non-zero _y_, for derived implementations may return zero.
1171311746
For non-derived implementations, the error is {leq} 8192 ulp.
1171411747
footnote:[{fn-pow-performance}]
11748+
1171511749
| *pown*(_x_, _y_)
11716-
| Defined only for integer values of y.
11750+
| Defined only for integer values of _y_.
1171711751
Undefined for _x_ = 0 and _y_ = 0.
11718-
For _x_ >= 0 or _x_ < 0 and even _y_, derived implementations
11719-
implement this as *exp2*(_y_ * *log2*(*fabs*(_x_))).
11720-
For _x_ < 0 and odd _y_, derived implementations implement this as
11721-
-*exp2*(_y_ * *log2*(*fabs*(_x_)).
11752+
For _x_ >= 0 or _x_ < 0 and even _y_, derived implementations may implement as
11753+
*exp2*(_y_ * *log2*(*fabs*(_x_))).
11754+
For _x_ < 0 and odd _y_, derived implementations may implement as
11755+
-*exp2*(_y_ * *log2*(*fabs*(_x_))).
1172211756
For non-derived implementations, the error is {leq} 8192 ulp.
11757+
1172311758
| *powr*(_x_, _y_)
1172411759
| Defined only for _x_ >= 0.
1172511760
Undefined for _x_ = 0 and _y_ = 0.
11726-
Derived implementations implement this as *exp2*(_y_ * *log2*(_x_)).
11761+
Derived implementations may implement as *exp2*(_y_ * *log2*(_x_)).
1172711762
For non-derived implementations, the error is {leq} 8192 ulp.
11763+
1172811764
| *rootn*(_x_, _y_)
11729-
| Defined for _x_ > 0 when _y_ is nonzero, derived implementations
11730-
implement this case as *exp2*(log2(_x_) / _y_).
11731-
Defined for _x_ < 0 when _y_ is odd, derived implementations implement
11732-
this case as -*exp2*(*log2*(-_x_) / _y_).
11733-
Defined for _x_ = +/-0 when _y_ > 0, derived implementations will
11765+
| Defined for _x_ > 0 when _y_ is non-zero, derived implementations
11766+
may implement this case as *exp2*(*log2*(_x_) / _y_).
11767+
Defined for _x_ < 0 when _y_ is odd, derived implementations
11768+
may implement this case as -*exp2*(*log2*(-_x_) / _y_).
11769+
Defined for _x_ = +/-0 when _y_ > 0, derived implementations may
1173411770
return +0 in this case.
1173511771
For non-derived implementations, the error is {leq} 8192 ulp.
11772+
1173611773
| *sin*(_x_)
1173711774
| For _x_ in the domain [-{pi}, {pi}], the maximum absolute error is
1173811775
{leq} 2^-11^ and larger otherwise.
11776+
1173911777
| *sincos*(_x_)
11740-
| ulp values as defined for *sin*(_x_) and *cos*(_x_)
11778+
| ulp values as defined for *sin*(_x_) and *cos*(_x_).
11779+
1174111780
| *sinh*(_x_)
1174211781
| Defined for _x_ in the domain [-88,88].
11743-
For _x_ in [-2^-10,2^-10], derived implementations implement as _x_.
11744-
For _x_ outside of [-2^10,2^10], derived implement as *0.5f *
11745-
(*exp*(_x_) - *exp*(-_x_)).
11782+
For _x_ in [-2^-10^, 2^-10^], derived implementations
11783+
may implement as _x_.
11784+
For _x_ outside of [-2^-10^, 2^-10^], derived implementations
11785+
may implement as 0.5f * (*exp*(_x_) - *exp*(-_x_)).
1174611786
For non-derived implementations, the error is {leq} 8192 ulp.
11787+
1174711788
| *sinpi*(_x_)
1174811789
| For _x_ in the domain [-1, 1], the maximum absolute error is {leq}
1174911790
2^-11^ and larger otherwise.
11791+
1175011792
| *tan*(_x_)
11751-
| Derived implementations implement this as *sin*(_x_) * (`1.0f` /
11752-
*cos*(_x_)).
11793+
| Derived implementations may implement as
11794+
*sin*(_x_) * (1.0f / *cos*(_x_)).
1175311795
For non-derived implementations, the error is {leq} 8192 ulp.
11796+
11797+
//| *tanh*(_x_)
11798+
// | Defined for _x_ in the domain [-{inf}, {inf}].
11799+
// For _x_ in [-2^-10^, 2^-10^], derived implementations
11800+
// may implement as _x_.
11801+
// For _x_ outside of [-2^-10^, 2^-10^], derived implementations
11802+
// may implement as (*exp*(_x_) - *exp*(-_x_)) / (*exp*(_x_) + *exp*(-_x_)).
11803+
// For non-derived implementations, the error is {leq} 8192 ULP.
11804+
1175411805
| *tanpi*(_x_)
11755-
| Derived implementations implement this as *tan*(_x_ * `M_PI_F`).
11806+
| Derived implementations may implement as *tan*(_x_ * `M_PI_F`).
1175611807
For non-derived implementations, the error is {leq} 8192 ulp for _x_
1175711808
in the domain [-1, 1].
11809+
1175811810
| _x_ * _y_ + _z_
1175911811
| Implemented either as a correctly rounded *fma* or as a multiply and
1176011812
an add both of which are correctly rounded.

0 commit comments

Comments
 (0)