-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] OPTICS Change quick_scan floating point comparison to isclose #11929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm happy with this
7fbb955
to
2684ce3
Compare
sklearn/cluster/tests/test_optics.py
Outdated
@@ -190,226 +189,65 @@ def test_reach_dists(): | |||
# Expected values, matches 'RD' results from: | |||
# http://chemometria.us.edu.pl/download/optics.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you made sure the results indeed match with this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve I actually had not. Somehow I missed this comment. But I did now, and the results are not the same. Not even close actually. But I also checked if the values I'm replacing match the output of that code, and they don't match them. So at some point in time, either somebody forgot to check against the code provided in that URL, or that URL has changed the code and it doesn't match what it used to be. @espg any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check for the iPython notebook where I generated the values; I think I remember having to sort the output by the ordering indexes to get them to match...
sklearn/cluster/tests/test_optics.py
Outdated
C4 = [-2, 3] + .3 * rng.randn(n_points_per_cluster, 2) | ||
C5 = [3, -2] + 1.6 * rng.randn(n_points_per_cluster, 2) | ||
C6 = [5, 6] + 2 * rng.randn(n_points_per_cluster, 2) | ||
n_points_per_cluster = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you change the test here? I'm fine with reducing n_points_per_cluster
, but .8 * rng.randn(n_points_per_cluster, 2)
-> 8 * rng.randn(n_points_per_cluster, 2)
seems strange. It's hard to form a cluster with your version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was kinda related to floating point precisions, therefore we had the idea to scale up the data. But you're right, to scale the data I also need to scale the centers. I'll do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about keeping the original test? Or the original test cannot detect your problem, but the new test can?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think it's good to get expected result from our implementation. Where's your expected result come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test fails on 32 bit systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The related conversation happened after this comment: #11857 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test fails on 32 bit systems.
I assume this PR aims to solve the problem. Seems that I'm wrong? So it's impossible to pass the original test on 32-bit system? If so, is it possible to only reduce n_points_per_cluster
? If not, then I think it's acceptable to scale up the data, as long as you get expected result from another implementation (e.g., R)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that, but are you opposed to changing the ==
to np.isclose
when comparing floats?
I can't find any detailed information about the code from http://chemometria.us.edu.pl/download/optics.py. It's it cited or mentioned elsewhere? Maybe it's better to compare with R? |
No, we've encountered similar problems (e.g., #11551). It seems that the old test and the new test pass on master and in this PR (correct me if I'm wrong). So I'd prefer a +1/-1 PR unless you provide me with your reason to change the test. But there's another problem here. It seems that the performance of OPTICS will be influenced if we use np.isclose in cython code. I'm not sure what's the solution (maybe seeking for some low-level functions).
|
It passes the tests because we don't build/test for 32 bit systems on this repo. That's why @jnothman sent this quick fix: (#11916). The issue is not fixed yet on master. On top of that, the current test on master fails on 32bit arch even with the I'm trying to figure a faster way than |
regarding the I've put it in this PR, since it's supported only from Python 3.5. |
Yes, but my point here is that if the referenced implementation is not cited/mentioned elsewhere (indicating that it's not a popular implementation), this implementation might have some bugs.
Agree, it's good to compare with the implementation from the original author. @espg Your contribution here will be very welcomed. But I think it's not the most urgent thing. For now, I think we need to ensure the correctness of our implementation. @adrinjalali For current PR, I still fail to understand why we need to change the test.
Yes, it fails on 32bit machine, but we actually can't find the reason, right? If so, I prefer to keep the test to make the PR minimal. We've already skipped some tests on 32bit machine. Regarding the PR itself, I'm not against getting rid of ==, but would like to see a test which fails in master but passes in the PR. I think it will be difficult so I won't insist on it. |
Thanks @espg (though I don't have time to read them now), I'm now much more confident to use the implementation. |
We know the broad reason it fails on 32bit: results rely discretely on a
sort over two columns of floats. We only use the second column if adjacent
values in the first are equal. But there we rely on float equality.
Numerical imprecision results in different things being equal in the 32 bit
case than in the 64 bit case.
|
This is what I can add:
and comparing the values between 32 and 64 archs, it's the That means, the test fails on a 32 bit arch because of a precision issue, but that precision issue happens at a point which makes the algorithm follow different paths on the two archs, i.e.
does NOT impose any computation penalty, i.e.:
The above proposal would make the test pass without any change to the test, on both archs. But as @jnothman also pointed out (#11857 (comment)).
3.2. Upscale the space of the test by a factor of 100 3.3. Reduce the number of points in each cluster
I really prefer a test which passes regardless of the arch, rather than having two tests. That would mean the only thing left is to get the Does it seam reasonable @espg, @jnothman, @qinhanmin2014? |
I'm fine with using 50 points per cluster if it fixes all of our problems; the only reason that 250 was used was to match the unit test to the documentation example. The 'chemometria' OPTICS implementation that I used for the test array is here , although note that it has a dependency of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy with that analysis and the proposal to reduce the number of samples per cluster.
sklearn/cluster/_optics_inner.pyx
Outdated
@@ -5,6 +5,9 @@ cimport cython | |||
ctypedef np.float64_t DTYPE_t | |||
ctypedef np.int_t DTYPE | |||
|
|||
cdef isclose(double a, double b, double rel_tol=1e-09, double abs_tol=0.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cdef inline
rather than relying on the C compiler to work it out
Thanks @adrinjalali and @espg |
@espg I've reverted the test to master, since I couldn't generate expected values with your script (checked the order issue, and possible missing sqrt, still). Could you please send a PR with 50 points in each group? No need to change C1 to C6 definitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, ping @jnothman
@adrinjalali Sure, working on it now. |
I'm running into weirdness... Identical results between the two implementations when using n=250 per group, but divergent answers when using n=50. Not sure what to do here; apparently the implementations only agree on datasets above a certain size. So, using n=50 will cause the comparison to fail. Using 250 will cause them to agree, but then the numerical instability will cause the 32-bit comparisons to diverge. I like this test quite a lot, as it directly verifies that the reachability values aren't changing or drifting... but I'm at a loss with the other OPTICS code and why it only agrees on large datasets. We could change to 50 points per cluster and use the output from OPTICS at merge time... that would fix the 32-bit issue and let us know if we accidentally implemented any changes that change the core numeric output of the algorithm. But we would be only looking internally, and not checking against another independent implementation. Are we comfortable with that at this point? |
Interesting. The first thing that comes to mind is the effect of
min_samples. How does varying that affect agreement?
|
But we should also consider whether we have enough concerns with our
implementation to not release in 0.20
|
min samples doesn't seem to change things much-- setting different values still has disagreement between the implementations at lower densities. There is an offset between the implementations; the Brian Clowers code doesn't include the point under consideration in the density estimation, while ours does. In practice, this means that the outputs are identical in the 250 points per cluster case if we set |
Do you think we should be excluding that point from the density estimation?
|
@jnothman our definition of whether or not |
Okay. So let's make sure that discrepancy from other implementations is
noted in documentation here.
|
Anything left here I should do? I guess it can be merged. |
Thanks @adrinjalali |
See #11857, #11916, #11878, and MacPython/scikit-learn-wheels#7
The part of the fix which I believe is valid, is the fix for comparison between two floats.
Other than that, scaling up the locations of the points, and reducing the number of points in each group, fixes the test issue, while still keeping the test valid (IMO).
Does this look okay @jnothman, @espg ?