Skip to content

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Aug 30, 2020

When summing the squares, the lossy step occurs during the accumulation of fractional values. We can improve accuracy by keeping a separate accumulator for the frac += lo * lo step so that the small fractional values don't get overpowered by the larger fractional values.

As a nice side benefit, the generated code is slightly smaller — it saves two movapd instructions without increasing the number of registers in use.. Also the flow graph has fewer sequential dependencies to interfere with pipelining and parallel execution.

I compared having one, two, or three separate fractional value accumulators. For all dimensions, two accumulators are more accurate than just one. Up to a dozen or so dimensions, two accumulators are more accurate than three. See the analysis code at https://bugs.python.org/file49435/best_frac.py

https://bugs.python.org/issue41513

@rhettinger rhettinger assigned tim-one and unassigned tim-one Aug 30, 2020
@rhettinger rhettinger changed the title bpo-41513: Further improve accuracy of hypot() Further improve accuracy of math.hypot() Aug 30, 2020
@rhettinger rhettinger merged commit 92c3816 into python:master Aug 30, 2020
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants