Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[MRG] Fix euclidean_distances numerical instabilities #13410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix euclidean_distances numerical instabilities #13410
Changes from all commits
005a311
a7e8e5e
419d90e
ecf8c3c
cd5eec2
81ef489
25d90f1
0d10f03
d6849f7
f376632
8cf87df
9adcc24
bb750fc
9f02593
9bd0f4d
2c2098a
cecdb5f
7e4acdf
dfc82b8
ab78675
17fa839
f66dc97
d49fe8b
aab5f48
9e954b2
809591e
9b4a5a4
79faace
bdbb5c3
4e9e4c2
390cba4
5d7721b
f4fca49
5fe7644
6694ee1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might say something stupid, but I'm not sure what you're trying to avoid there.
If a compiler optimize agressively enough to try to rearrange your arithmetic operations in an unsafe way, then I don't think fusing two independant loops would be a problem for it.
Additionally, AFAIK, most (if not all) compilers nowadays know pretty well how floating point arithmetic work.
Moreover, I think see a few opportunity to improve the numeric accuracy by fusing the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering about this -- I imagine, there should be no unsafe optimizations unless you are using non-standard flags (e.g.
-Ofast
)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that too, but this is not what I observe. I don't really know what's going on. The loop should be:
but even without the
-Ofast
flag, there's is some kind of optimization because the result is not correct.It looks like gcc expands the expression
which might be wrong in floating point arithmetic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to add a unit test that detects bad numeric optimization!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you do that ?
I added tests for the euclidean distances which failed when I didn't split the loop or when I use the -Ofast flag. Now they pass. Is that what you mean ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parts (terms) of the last equation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar loss likely happens here:
(tmp * tmp) - (xi * xi)
, whentmp
andxi
are of similar magnitude this equations does have catastrophic cancellation (here this happens when yi is small).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither am I, so take everything I say here with a grain of salt.
First, I'd like to make a note on the vocabulary around floating point arithmetic.
Well, none your transformations are exactly equivalent. Floating point arithmetic is commutative, but not associative. In general
(a+b)+c != a+(b+c)
. You can't reorder a sum as you wish without changing the result. That might improve the accuracy, but that might also make it worse. You usually want to add values that are not too far from each other in order to avoid cancellation.As a side note about summation, numpy implement a pairwise summation which has a better accuracy than the straightforward sum and is faster (but less accurate) than the Kahan summation.
What makes you say so? How did you measure it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well just that the tests fails when I regroup the 2 loops :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which test? I can't reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using float64 accumulators here help precision and reduce the (low) risk of overflow? Related to #13010
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it force conversion of
tmp
inresult += tmp*tmp
in that case ?also we need to convert the result to float32 when we want float32.
I should test how it impacts performances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost is quite high. In this example:
There's a 20% slowdown using a float64 accumulator.
I guess the precision is fine here since it's the exact calculation method.
I'm not sure about the risk of overflow. If the result is too big to be represented in float32, even if you can compute it's correct value on float64, when in the end you downcast it to float32 you'll get an inf anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a bit surprising that these implementations are in different modules. Why did you want them separated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want them to be separated. I had to separate them because I wanted to compile
pairwise_fast
with the-ffast-math
flag. The thing is that with this flag, gcc performs unsafe optimizations for the_safe_euclidean_sparse
functions.The reason I wanted to use this flag for
pairwise_fast
is that it's much faster, and should be safe because there's no operation that gcc would optimize in an unsafe way. By the way, scipy uses this flag for its euclidean distance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept that change because it was wrong. The output is never sparse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector -> vectors