[MRG+1] OPTICS Change quick_scan floating point comparison to isclose #11929

adrinjalali · 2018-08-28T12:27:47Z

See #11857, #11916, #11878, and MacPython/scikit-learn-wheels#7

The part of the fix which I believe is valid, is the fix for comparison between two floats.

Other than that, scaling up the locations of the points, and reducing the number of points in each group, fixes the test issue, while still keeping the test valid (IMO).

Does this look okay @jnothman, @espg ?

jnothman

Yes, I'm happy with this

lesteve · 2018-09-01T12:02:05Z

sklearn/cluster/tests/test_optics.py

@@ -190,226 +189,65 @@ def test_reach_dists():
    # Expected values, matches 'RD' results from:
    # http://chemometria.us.edu.pl/download/optics.py


Have you made sure the results indeed match with this implementation?

@lesteve I actually had not. Somehow I missed this comment. But I did now, and the results are not the same. Not even close actually. But I also checked if the values I'm replacing match the output of that code, and they don't match them. So at some point in time, either somebody forgot to check against the code provided in that URL, or that URL has changed the code and it doesn't match what it used to be. @espg any idea?

I'll check for the iPython notebook where I generated the values; I think I remember having to sort the output by the ordering indexes to get them to match...

qinhanmin2014 · 2018-09-03T13:43:37Z

sklearn/cluster/tests/test_optics.py

-    C4 = [-2, 3] + .3 * rng.randn(n_points_per_cluster, 2)
-    C5 = [3, -2] + 1.6 * rng.randn(n_points_per_cluster, 2)
-    C6 = [5, 6] + 2 * rng.randn(n_points_per_cluster, 2)
+    n_points_per_cluster = 50


Why do you change the test here? I'm fine with reducing n_points_per_cluster, but .8 * rng.randn(n_points_per_cluster, 2) -> 8 * rng.randn(n_points_per_cluster, 2) seems strange. It's hard to form a cluster with your version.

The issue was kinda related to floating point precisions, therefore we had the idea to scale up the data. But you're right, to scale the data I also need to scale the centers. I'll do so.

How about keeping the original test? Or the original test cannot detect your problem, but the new test can?

Also, I don't think it's good to get expected result from our implementation. Where's your expected result come from?

The original test fails on 32 bit systems.

The related conversation happened after this comment: #11857 (comment)

The original test fails on 32 bit systems.

I assume this PR aims to solve the problem. Seems that I'm wrong? So it's impossible to pass the original test on 32-bit system? If so, is it possible to only reduce n_points_per_cluster? If not, then I think it's acceptable to scale up the data, as long as you get expected result from another implementation (e.g., R)

I could do that, but are you opposed to changing the == to np.isclose when comparing floats?

qinhanmin2014 · 2018-09-03T13:46:18Z

I can't find any detailed information about the code from http://chemometria.us.edu.pl/download/optics.py. It's it cited or mentioned elsewhere? Maybe it's better to compare with R?

qinhanmin2014 · 2018-09-03T14:35:05Z

are you opposed to changing the == to np.isclose

No, we've encountered similar problems (e.g., #11551).

It seems that the old test and the new test pass on master and in this PR (correct me if I'm wrong). So I'd prefer a +1/-1 PR unless you provide me with your reason to change the test.

But there's another problem here. It seems that the performance of OPTICS will be influenced if we use np.isclose in cython code. I'm not sure what's the solution (maybe seeking for some low-level functions).

%timeit test_reach_dists() # using ==
#1 loop, best of 3: 1.11 s per loop
%timeit test_reach_dists() # using np.isclose
1 loop, best of 3: 38 s per loop

adrinjalali · 2018-09-03T15:39:30Z

It passes the tests because we don't build/test for 32 bit systems on this repo. That's why @jnothman sent this quick fix: (#11916).

The issue is not fixed yet on master.

On top of that, the current test on master fails on 32bit arch even with the np.isclose proposed here. That's why we're changing the test.

I'm trying to figure a faster way than np.isclose, unless you have a solution in mind.

adrinjalali · 2018-09-03T16:04:56Z

regarding the np.isclose performance issue, it seems if we replace it with the cython version of what's proposed in PEP485, we don't pay any performance penalty.

I've put it in this PR, since it's supported only from Python 3.5.

qinhanmin2014 · 2018-09-04T05:37:34Z

and then when asked to compare against a reference implementation, the reachability output matched.

Yes, but my point here is that if the referenced implementation is not cited/mentioned elsewhere (indicating that it's not a popular implementation), this implementation might have some bugs.

I'd trust the Elki / java implementation more than the R one, since it was done by the original paper authors...

Agree, it's good to compare with the implementation from the original author. @espg Your contribution here will be very welcomed. But I think it's not the most urgent thing. For now, I think we need to ensure the correctness of our implementation.

@adrinjalali For current PR, I still fail to understand why we need to change the test.

On top of that, the current test on master fails on 32bit arch even with the np.isclose proposed here. That's why we're changing the test.

Yes, it fails on 32bit machine, but we actually can't find the reason, right? If so, I prefer to keep the test to make the PR minimal. We've already skipped some tests on 32bit machine.

Regarding the PR itself, I'm not against getting rid of ==, but would like to see a test which fails in master but passes in the PR. I think it will be difficult so I won't insist on it.

qinhanmin2014 · 2018-09-04T05:42:18Z

Brian Clowers code is a port of Michal Daszykowski's matlab code; see here and here. Michal Daszykowski has two publications that reference or use his OPTICS implementation, and his implementation has it's own DOI (DOI: 10.13140/RG.2.1.3998.3843).

Thanks @espg (though I don't have time to read them now), I'm now much more confident to use the implementation.

jnothman · 2018-09-04T05:44:01Z

We know the broad reason it fails on 32bit: results rely discretely on a sort over two columns of floats. We only use the second column if adjacent values in the first are equal. But there we rely on float equality. Numerical imprecision results in different things being equal in the 32 bit case than in the 64 bit case.

adrinjalali · 2018-09-04T16:04:31Z

This is what I can add:

Issue Related to 32bit Systems
The issue starts here:

        if len(unproc) > 0:
            dists = pairwise_distances(P, np.take(X, unproc, axis=0),
                                       self.metric, n_jobs=None).ravel()

            rdists = np.maximum(dists, self.core_distances_[point_index])
            new_reach = np.minimum(np.take(self.reachability_, unproc), rdists)
            self.reachability_[unproc] = new_reach

        # Checks to see if everything is already processed;
        # if so, return control to main loop
        if unproc.size > 0:
            # Define return order based on reachability distance
            return(unproc[quick_scan(np.take(self.reachability_, unproc),
                                     dists)])

and comparing the values between 32 and 64 archs, it's the new_reach which is different in the two archs (of the order of 1e-13). As a result of that, quick_scan returns a different index and therefore points are processed in a different order. This is why one of the reachability values is significantly different in the two archs, and IMO, if we're testing the reachability part of the algorithm, we should also check the ordering_, and if we do that, we see that there's a significant different in the two archs and the test would fail as it is now, no matter what we set the comparison tolerance for reachability_.

That means, the test fails on a 32 bit arch because of a precision issue, but that precision issue happens at a point which makes the algorithm follow different paths on the two archs, i.e. reachability_ values are different because the algorithm processes points in a different order, not because of some 32bit floating point calculation issue directly on reachability_.

Performance Issue
This PR's implementation of the _optics_inner.pyx:

cdef isclose(double a, double b, double rel_tol=1e-09, double abs_tol=0.0):
    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

does NOT impose any computation penalty, i.e.:

%timeit test_reach_dists() # with ==
736 ms ± 48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit test_reach_dists() # with isclose as implemented in this PR
735 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Different Proposed Solutions
3.1. Add tolerance to all comparisons
The first solution I came up with, which isn't a good idea. It would be a change like this (f50e778?diff=unified):

    for i from 0 <= i < n:
-       if rdists[i] < rdist:
+       if rdists[i] + 1e-7 < rdist:
            rdist = rdists[i]
            dist = dists[i]
            idx = i
-       if rdists[i] == rdist:
-           if dists[i] < dist:
+       if np.abs(rdists[i] - rdist) < 1e-7:
+           if dists[i] + 1e-7 < dist:
                dist = dists[i]
                idx = i

The above proposal would make the test pass without any change to the test, on both archs. But as @jnothman also pointed out (#11857 (comment)).

I'm not entirely convinced that it's the right fix. It will make it consistent across platforms, but can easily produce a different order from expected.

3.2. Upscale the space of the test by a factor of 100
The idea was to multiply the data by 100 for example, and hope that the distances are large enough for the two archs to give the same results. But it didn't.

3.3. Reduce the number of points in each cluster
The data is only 2 dimensional, therefore it may be the case that some of the randomly generated data are really close to each other. Reducing the number of points in each cluster to 50, makes both archs to give the same results (including ordering_). So yes, we don't have to upscale the data if we're reducing the number of points to 50. I hadn't checked this scenario due to time pressure. I tested it today and it works.

Summary

There seems to be consensus that using isclose is better than == comparing floats, and since this implementation doesn't lower the performance of the algorithm, it seems reasonable to keep it.
Reducing the number of values in each group to 50 would make the algorithm produce same results on both archs (with or without isclose), yet it keeps the test reasonable enough to be a valid test.

I really prefer a test which passes regardless of the arch, rather than having two tests. That would mean the only thing left is to get the reachability values from a different implementation rather than our implementation.

Does it seam reasonable @espg, @jnothman, @qinhanmin2014?

espg · 2018-09-04T18:29:21Z

I'm fine with using 50 points per cluster if it fixes all of our problems; the only reason that 250 was used was to match the unit test to the documentation example. The 'chemometria' OPTICS implementation that I used for the test array is here , although note that it has a dependency of hcluster that prevents it from being called in the unit test itself. If you'd like me to, I can open a PR that amends the testing array to use 50 points per cluster.

jnothman

I'm very happy with that analysis and the proposal to reduce the number of samples per cluster.

jnothman · 2018-09-05T00:34:38Z

sklearn/cluster/_optics_inner.pyx

@@ -5,6 +5,9 @@ cimport cython
 ctypedef np.float64_t DTYPE_t
 ctypedef np.int_t DTYPE

+cdef isclose(double a, double b, double rel_tol=1e-09, double abs_tol=0.0):


Use cdef inline rather than relying on the C compiler to work it out

qinhanmin2014 · 2018-09-05T01:14:06Z

Thanks @adrinjalali and @espg
(1) +1 to replace == with cdef inline isclose, though I still wonder if there's a better way to do such a simple thing than defining a function.
(2) +1 to reduce number of samples to 50 since it will make us easier to construct a test. I'm still not persuaded to change the definition of C1-C6, or at least @adrinjalali you should scale up the center to make the test reasonable.
(3) +1 to check reachability_ and ordering_, but that's not urgent so it can be done in a seperate PR.
(4) @adrinjalali Please make sure that you're generating the expected result using the referenced script.

adrinjalali · 2018-09-05T13:41:30Z

@espg I've reverted the test to master, since I couldn't generate expected values with your script (checked the order issue, and possible missing sqrt, still).

Could you please send a PR with 50 points in each group? No need to change C1 to C6 definitions.

qinhanmin2014

LGTM, ping @jnothman

espg · 2018-09-06T21:30:29Z

@adrinjalali Sure, working on it now.

espg · 2018-09-07T02:32:46Z

I'm running into weirdness... Identical results between the two implementations when using n=250 per group, but divergent answers when using n=50. Not sure what to do here; apparently the implementations only agree on datasets above a certain size. So, using n=50 will cause the comparison to fail. Using 250 will cause them to agree, but then the numerical instability will cause the 32-bit comparisons to diverge.

I like this test quite a lot, as it directly verifies that the reachability values aren't changing or drifting... but I'm at a loss with the other OPTICS code and why it only agrees on large datasets. We could change to 50 points per cluster and use the output from OPTICS at merge time... that would fix the 32-bit issue and let us know if we accidentally implemented any changes that change the core numeric output of the algorithm. But we would be only looking internally, and not checking against another independent implementation. Are we comfortable with that at this point?

jnothman · 2018-09-07T05:53:08Z

Interesting. The first thing that comes to mind is the effect of min_samples. How does varying that affect agreement?

jnothman · 2018-09-07T05:58:28Z

But we should also consider whether we have enough concerns with our implementation to not release in 0.20

espg · 2018-09-10T17:33:15Z

min samples doesn't seem to change things much-- setting different values still has disagreement between the implementations at lower densities. There is an offset between the implementations; the Brian Clowers code doesn't include the point under consideration in the density estimation, while ours does. In practice, this means that the outputs are identical in the 250 points per cluster case if we set min_samples = 10 for our implementation and min_samples = 9 for the Clowers code. This might explain why we don't get identical results at the lower point density, although I'd have to think a bit more on why...

jnothman · 2018-09-11T22:43:50Z

Do you think we should be excluding that point from the density estimation?

espg · 2018-09-12T06:14:57Z

@jnothman It took me a bit to digest your earlier comment, but I think I finally have a solution for this. See #12054

espg · 2018-09-12T16:18:38Z

@jnothman our definition of whether or not min_samples includes the query point is tied to maintaining compatibility with the DBSCAN module, which includes the query point. Keeping them both to include the query point ensures that extraction using extract_dbscan gives identical or near identical results.

jnothman · 2018-09-13T08:03:29Z

Okay. So let's make sure that discrepancy from other implementations is noted in documentation here.

adrinjalali · 2018-09-15T11:41:52Z

Anything left here I should do? I guess it can be merged.

jnothman · 2018-09-16T00:09:35Z

Thanks @adrinjalali

jnothman approved these changes Aug 28, 2018

View reviewed changes

adrinjalali changed the title ~~Fix quick_scan and reachability test for OPTICS~~ [MRG+1] Fix quick_scan and reachability test for OPTICS Aug 29, 2018

adrin jalali and others added 20 commits August 29, 2018 10:48

debugging

665b586

debugging

94a51b2

debugging

0d9d41a

dump everything

0e65ccd

fix dump

da32093

fix dumps

495288b

debugging

4d9f0d1

debugging

6e9d35f

check neighbors

e7d39cb

debug further

58a02c8

crazy debug

11cf3d9

crazy debug

ce9ea05

crazy debug

97cb882

more debug

5912d8e

new test

7555f8a

change test

e32c993

use np.isclose

52e56d0

missing new line

3b8f9ce

fix flake8 issues

2d76e0c

remove 32bit check

2684ce3

adrinjalali force-pushed the optics32_pr branch from 7fbb955 to 2684ce3 Compare August 29, 2018 08:50

lesteve reviewed Sep 1, 2018

View reviewed changes

qinhanmin2014 reviewed Sep 3, 2018

View reviewed changes

replace np.isclose

a693f2e

jnothman reviewed Sep 5, 2018

View reviewed changes

adrinjalali added 3 commits September 5, 2018 13:44

make isclose inline

4959283

revert test change

8141dfc

add missing import

e6cef7c

qinhanmin2014 approved these changes Sep 5, 2018

View reviewed changes

adrinjalali changed the title ~~[MRG+1] Fix quick_scan and reachability test for OPTICS~~ [MRG+1] OPTICS Change quick_scan floating point comparison to isclose Sep 5, 2018

adrinjalali mentioned this pull request Sep 10, 2018

OPTICS detecting the wrong outlier #11677

Closed

espg mentioned this pull request Sep 10, 2018

OPTICS extraction methods #12036

Closed

espg mentioned this pull request Sep 12, 2018

Optics test fix (32 and 64 bit numerical stability) #12054

Closed

adrinjalali mentioned this pull request Sep 12, 2018

Remove OPTICS from 0.20 #12053

Merged

jnothman merged commit a79d44e into scikit-learn:master Sep 16, 2018

adrinjalali deleted the optics32_pr branch September 16, 2018 07:35

		@@ -190,226 +189,65 @@ def test_reach_dists():
		# Expected values, matches 'RD' results from:
		# http://chemometria.us.edu.pl/download/optics.py

Uh oh!

[MRG+1] OPTICS Change quick_scan floating point comparison to isclose #11929

[MRG+1] OPTICS Change quick_scan floating point comparison to isclose #11929

Uh oh!

Conversation

adrinjalali commented Aug 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Sep 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Sep 3, 2018

Uh oh!

qinhanmin2014 commented Sep 3, 2018

Uh oh!

adrinjalali commented Sep 3, 2018

Uh oh!

adrinjalali commented Sep 3, 2018

Uh oh!

qinhanmin2014 commented Sep 4, 2018

Uh oh!

qinhanmin2014 commented Sep 4, 2018

Uh oh!

jnothman commented Sep 4, 2018 via email

Uh oh!

adrinjalali commented Sep 4, 2018

Uh oh!

espg commented Sep 4, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Sep 5, 2018

Uh oh!

adrinjalali commented Sep 5, 2018

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

espg commented Sep 6, 2018

Uh oh!

espg commented Sep 7, 2018

Uh oh!

jnothman commented Sep 7, 2018 via email

Uh oh!

jnothman commented Sep 7, 2018 via email

Uh oh!

espg commented Sep 10, 2018

Uh oh!

jnothman commented Sep 11, 2018 via email

Uh oh!

espg commented Sep 12, 2018

Uh oh!

espg commented Sep 12, 2018

adrinjalali commented Aug 28, 2018 •

edited

Loading

qinhanmin2014 Sep 3, 2018 •

edited

Loading