Skip to content

[MRG+1] Fix optics metric issues (DOC and precomputed) #12028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 17, 2018

Conversation

adrinjalali
Copy link
Member

Fixes #12009
See #11982

Still doesn't support the output of radius_neighbors_graph as input though.

Question: what's the criterion to include one's name in the authors section in the files? Are those only the original authors or do I count as an author in the optics_.py and test_optics.py now?

@adrinjalali adrinjalali changed the title Fix optics metic issues (DOC and precomputed) Fix optics metric issues (DOC and precomputed) Sep 6, 2018
@adrinjalali
Copy link
Member Author

I'm not sure about this travis error, is it because of something I did?

/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/py/_path/common.py:377: in visit
    for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/py/_path/common.py:419: in gen
    if p.check(dir=1) and (rec is None or rec(p))])
/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/_pytest/main.py:561: in _recurse
    ihook = self.gethookproxy(path)
/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/_pytest/main.py:416: in gethookproxy
    my_conftestmodules = pm._getconftestmodules(fspath)
/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/_pytest/config/__init__.py:395: in _getconftestmodules
    mod = self._importconftest(conftestpath)
/home/travis/build/scikit-learn/scikit-learn/testvenv/local/lib/python2.7/site-packages/_pytest/config/__init__.py:431: in _importconftest
    raise ConftestImportFailure(conftestpath, sys.exc_info())
E   ConftestImportFailure: ModuleDeprecationWarning('The oldnumeric module will be dropped in Numpy 1.9',)
E     File "/home/travis/build/scikit-learn/scikit-learn/sklearn/__init__.py", line 64, in <module>
E       from .base import clone
E     File "/home/travis/build/scikit-learn/scikit-learn/sklearn/base.py", line 11, in <module>
E       from scipy import sparse
E     File "/usr/lib/python2.7/dist-packages/scipy/__init__.py", line 77, in <module>
E       from numpy import oldnumeric
E     File "/usr/lib/python2.7/dist-packages/numpy/oldnumeric/__init__.py", line 11, in <module>
E       warnings.warn(_msg, ModuleDeprecationWarning)

@jnothman
Copy link
Member

jnothman commented Sep 6, 2018 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should accept a sparse X as well, but that may be a separate pr.

Can you test metrics supported by neighbors and by metrics.pairwise and all their combinations. I'm a bit worried about us mixing them

@adrinjalali
Copy link
Member Author

I just checked the metrics supported by NearestNeighbors and pairwise_distances, and they're the same. They're just documented differently. There's just the matching distance which is an alias for hamming in scipy, which is not included in the docstring of NearestNeighbors, but is supported (it is included in pairwise_distances's docstring).

I also tried running OPTICS with all those metrics, and as long as the NearestNeighbors algorithm is set to auto, they all work.

@adrinjalali
Copy link
Member Author

@jnothman, regarding the sparse X, it seems it's the NearestNeighbors.kneighbors(...) which is the one not supporting a sparse input. The following code throws an exception:

from sklearn.neighbors import (radius_neighbors_graph,
                               NearestNeighbors)
import numpy as np

rng = np.random.RandomState(0)
n_points_per_cluster = 50

C1 = [-5, -2] + .8 * rng.randn(n_points_per_cluster, 2)
C2 = [4, -1] + .1 * rng.randn(n_points_per_cluster, 2)
C3 = [1, -2] + .2 * rng.randn(n_points_per_cluster, 2)
C4 = [-2, 3] + .3 * rng.randn(n_points_per_cluster, 2)
C5 = [3, -2] + 1.6 * rng.randn(n_points_per_cluster, 2)
C6 = [5, 6] + 2 * rng.randn(n_points_per_cluster, 2)
X = np.vstack((C1, C2, C3, C4, C5, C6))

dists = radius_neighbors_graph(X, radius=10)

nn = NearestNeighbors(metric='precomputed', algorithm='auto').fit(dists)
res = nn.kneighbors(dists, 10)

Which throws:

AxisError: axis 1 is out of bounds for array of dimension 1

@jnothman
Copy link
Member

jnothman commented Sep 8, 2018 via email

@jnothman jnothman added this to the 0.20 milestone Sep 9, 2018
@adrinjalali adrinjalali changed the title Fix optics metric issues (DOC and precomputed) [MRG+1] Fix optics metric issues (DOC and precomputed) Sep 10, 2018
@adrinjalali
Copy link
Member Author

I guess we can handle the sparse input in a separate PR, should we merge this?

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @adrinjalali

@qinhanmin2014 qinhanmin2014 merged commit efe7b8c into scikit-learn:master Sep 17, 2018
@adrinjalali adrinjalali deleted the optics/metric branch September 17, 2018 07:37
@huntzhan
Copy link
Contributor

Hi team, thanks for your great work! I was wondering if there's any plan to support sparse input for OPTICS recently?

@jnothman
Copy link
Member

jnothman commented Aug 22, 2019 via email

@huntzhan
Copy link
Contributor

Meaning the csr_matrix of X support in OPTICS.fit(self, X, y=None)

def fit(self, X, y=None, sample_weight=None):
"""Perform DBSCAN clustering from features, or distance matrix.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features), or \
(n_samples, n_samples)
Training instances to cluster, or distances between instances if
``metric='precomputed'``. If a sparse matrix is provided, it will
be converted into a sparse ``csr_matrix``.
sample_weight : array, shape (n_samples,), optional
Weight of each sample, such that a sample with a weight of at least
``min_samples`` is by itself a core sample; a sample with a
negative weight may inhibit its eps-neighbor from being core.
Note that weights are absolute, and default to 1.
y : Ignored
Not used, present here for API consistency by convention.
Returns
-------
self
"""
X = check_array(X, accept_sparse='csr')

@jnothman
Copy link
Member

jnothman commented Aug 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OPTICS metric param docstring and parameter handling needs improvements
4 participants