[MRG+1] OPTICS correctly handle multiple infs in reachability array. #12029

adrinjalali · 2018-09-06T16:09:11Z

The extraction part of the code assumes only the first reachability_ to be inf, but the array may include multiple infs. This PR fixes the issue, and fixes a test which used to result in all inf values. It also adds a ValueError in case all reachability values are inf. And also tests for that ValueError.

This is yet another PR to extract parts of #11857.

Also see #11857 (review)

Ping @jnothman

jnothman · 2018-09-07T01:01:27Z

Please credit yourself for the OPTICS feature in what's new v0.20

jnothman · 2018-09-07T01:01:56Z

I'm not sure why the python 2 ci is suddenly failing

adrinjalali · 2018-09-15T11:42:41Z

This seems ready to me. A second review is appreciated.

qinhanmin2014 · 2018-10-07T08:04:45Z

sklearn/cluster/optics_.py

@@ -606,7 +606,13 @@ def _extract_optics(ordering, reachability, maxima_ratio=.75,
    """

    # Extraction wrapper
-    reachability = reachability / np.max(reachability[1:])
+    # according to the paper (p. 5), for a small enough generative distance


Which paper? Maybe at least provide the author and the time.

qinhanmin2014 · 2018-10-07T08:13:09Z

sklearn/cluster/optics_.py

@@ -751,17 +757,17 @@ def _cluster_tree(node, parent_node, local_maxima_points,
    avg_reach2 = np.mean(reachability_plot[node_2.start:(node_2.start
                                                         + check_value_2)])

-    if ((avg_reach1 / reachability_plot[s]) > maxima_ratio or
-            (avg_reach2 / reachability_plot[s]) > maxima_ratio):
+    if ((avg_reach1 / maxima_ratio) > reachability_plot[s] or


(Not related) I'm wondering whether we need both maxima_ratio and rejection_ratio. Too many parameters make our OPTICS somehow unfriendly at least to me.

Any insights from your side? @adrinjalali

If I understand it correctly, setting the two parameters the same, would result in each split point to accept only one of the children, since the next condition is exactly the opposite. But I agree this method has quite a few parameters, which can be confusing specially if the user hasn't read both the papers for OPTICS and this method (which we're calling SQLNK I guess).

But I agree this method has quite a few parameters, which can be confusing specially if the user hasn't read both the papers for OPTICS and this method (which we're calling SQLNK I guess).

You find these two parameters in a paper? Which one? I only find one.

if ((average reachability value in any node in NL) / s.r_dist > 0.75) // if split point s is not significant, ignore s and continue cluster_tree(N, parent_of_N, L); //

Apart from these two parameters, some other parameters also seems strange, e.g., significant_min (we're normalizing RD so I don't think it makes sense to users) and the magical check_ratio inside the implementation. I notice that amyxzhang noted that this is "An implementation of the following algorithm, with some minor add-ons". I think we need to check these add-ons carefully. As I missing something @adrinjalali, otherwise I'm going to open an issue. Thanks in advance.

qinhanmin2014

LGTM, thanks @adrinjalali

adrinjalali added 2 commits September 6, 2018 18:00

handle multiple infs

23568c2

fix and add tests

db5532a

jnothman approved these changes Sep 6, 2018

View reviewed changes

jnothman added this to the 0.20 milestone Sep 6, 2018

credit in whats_new

d5d8022

adrinjalali changed the title ~~OPTICS correctly handle multiple infs in reachability array.~~ [MRG+1] OPTICS correctly handle multiple infs in reachability array. Sep 7, 2018

This was referenced Sep 10, 2018

OPTICS detecting the wrong outlier #11677

Closed

Remove OPTICS from 0.20 #12053

Merged

jnothman modified the milestones: 0.20, 0.21 Sep 17, 2018

merge master

34bf80c

qinhanmin2014 approved these changes Oct 7, 2018

View reviewed changes

add reference to the OPTICS paper in the comment.

da271e0

qinhanmin2014 approved these changes Oct 7, 2018

View reviewed changes

qinhanmin2014 merged commit 2020867 into scikit-learn:master Oct 7, 2018

adrinjalali deleted the optics/infs branch October 7, 2018 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] OPTICS correctly handle multiple infs in reachability array. #12029

[MRG+1] OPTICS correctly handle multiple infs in reachability array. #12029

Uh oh!

adrinjalali commented Sep 6, 2018

Uh oh!

jnothman commented Sep 7, 2018

Uh oh!

jnothman commented Sep 7, 2018

Uh oh!

adrinjalali commented Sep 15, 2018

Uh oh!

qinhanmin2014 Oct 7, 2018

Uh oh!

qinhanmin2014 Oct 7, 2018

Uh oh!

qinhanmin2014 Oct 7, 2018

Uh oh!

adrinjalali Oct 7, 2018

Uh oh!

qinhanmin2014 Oct 7, 2018

Uh oh!

qinhanmin2014 Oct 7, 2018

Uh oh!

qinhanmin2014 left a comment

Uh oh!

Uh oh!

Uh oh!

[MRG+1] OPTICS correctly handle multiple infs in reachability array. #12029

[MRG+1] OPTICS correctly handle multiple infs in reachability array. #12029

Uh oh!

Conversation

adrinjalali commented Sep 6, 2018

Uh oh!

jnothman commented Sep 7, 2018

Uh oh!

jnothman commented Sep 7, 2018

Uh oh!

adrinjalali commented Sep 15, 2018

Uh oh!

qinhanmin2014 Oct 7, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Oct 7, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Oct 7, 2018

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 7, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Oct 7, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Oct 7, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!