Skip to content

bpo-41513: Expand comments #22123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 6, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 21 additions & 5 deletions Modules/mathmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -2419,9 +2419,9 @@ To avoid overflow/underflow and to achieve high accuracy giving results
that are almost always correctly rounded, four techniques are used:

* lossless scaling using a power-of-two scaling factor
* accurate squaring using Veltkamp-Dekker splitting
* compensated summation using a variant of the Neumaier algorithm
* differential correction of the square root
* accurate squaring using Veltkamp-Dekker splitting [1]
* compensated summation using a variant of the Neumaier algorithm [2]
* differential correction of the square root [3]

The usual presentation of the Neumaier summation algorithm has an
expensive branch depending on which operand has the larger
Expand Down Expand Up @@ -2456,7 +2456,11 @@ Given that csum >= 1.0, we have:
Since lo**2 is less than 1/2 ulp(csum), we have csum+lo*lo == csum.

To minimize loss of information during the accumulation of fractional
values, each term has a separate accumulator.
values, each term has a separate accumulator. This also breaks up
sequential dependencies in the inner loop so the CPU can maximize
floating point throughput. [5] On a 2.6 GHz Haswell, adding one
dimension has an incremental cost of only 5ns -- for example when
moving from hypot(x,y) to hypot(x,y,z).

The square root differential correction is needed because a
correctly rounded square root of a correctly rounded sum of
Expand All @@ -2466,20 +2470,32 @@ The differential correction starts with a value *x* that is
the difference between the square of *h*, the possibly inaccurately
rounded square root, and the accurately computed sum of squares.
The correction is the first order term of the Maclaurin series
expansion of sqrt(h**2 + x) == h + x/(2*h) + O(x**2).
expansion of sqrt(h**2 + x) == h + x/(2*h) + O(x**2). [4]

Essentially, this differential correction is equivalent to one
refinement step in Newton's divide-and-average square root
algorithm, effectively doubling the number of accurate bits.
This technique is used in Dekker's SQRT2 algorithm and again in
Borges' ALGORITHM 4 and 5.

Without proof for all cases, hypot() cannot claim to be always
correctly rounded. However for n <= 1000, prior to the final addition
that rounds the overall result, the internal accuracy of "h" together
with its correction of "x / (2.0 * h)" is at least 100 bits. [6]
Also, hypot() was tested against a Decimal implementation with
prec=300. After 100 million trials, no incorrectly rounded examples
were found. In addition, perfect commutativity (all permutations are
exactly equal) was verified for 1 billion random inputs with n=5. [7]

References:

1. Veltkamp-Dekker splitting: http://csclub.uwaterloo.ca/~pbarfuss/dekker1971.pdf
2. Compensated summation: http://www.ti3.tu-harburg.de/paper/rump/Ru08b.pdf
3. Square root differential correction: https://arxiv.org/pdf/1904.09481.pdf
4. https://www.wolframalpha.com/input/?i=Maclaurin+series+sqrt%28h**2+%2B+x%29+at+x%3D0
5. https://bugs.python.org/file49439/hypot.png
6. https://bugs.python.org/file49435/best_frac.py
7. https://bugs.python.org/file49448/test_hypot_commutativity.py

*/

Expand Down