Skip to content

ENH Add sparse input support to OPTICS #22965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
May 2, 2022

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Mar 27, 2022

Reference Issues/PRs

Fixes #11982
Resolves #14736 (stalled)
Resolves #20802 (stalled)

What does this implement/fix? Explain your changes.

PRs #14736 and #20802: Add support for sparse matrices in OPTICS.fit
This PR: updates work and provides continuation for review.

Any other comments?

@Micky774 Micky774 changed the title [MRG] Make OPTICS support sparse input [MRG] Add sparse input support to OPTICS Mar 27, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to think about these warnings. Maybe we should not filter them since they indicate that the use of NearestNeighbors is not optimal. At least maybe there's something we can do to the precomputed dist matrix to not trigger all of them.

I think not all tests need to be parametrized by metric and run on dense and sparse, especially those who just do error checking or sanity checks.

@Micky774
Copy link
Contributor Author

Micky774 commented Apr 1, 2022

@jeremiedbb I have

  1. Converted lil_matrix constructions to csr_matrix
  2. Improved the docstring for metric by including a line stating that Sparse matrices are only supported by scikit-learn metrics.
  3. Removed the warning filters so that they are passed regularly
  4. Streamlined the tests by removing sparse parametrization for several tests, keeping the parametrization only for testing the correctness of the algorithm, DBSCAN parity, and the ability to input sparse precomputed distances.

@Micky774
Copy link
Contributor Author

Micky774 commented Apr 3, 2022

Note that the failing test is test_ridge_regression_vstacked_X[wide-5-True-sag] and consequently not related to this PR

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just think that you added some unreachable code, but that needs confirmation

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the sparse efficiency warning thing.

@jeremiedbb jeremiedbb added this to the 1.2 milestone Apr 29, 2022
Micky774 and others added 2 commits April 30, 2022 18:14
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@thomasjpfan thomasjpfan changed the title [MRG] Add sparse input support to OPTICS ENH Add sparse input support to OPTICS May 1, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +616 to +617
dists.sort_indices()
dists = dists.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremiedbb This is a change from your previous approval. Are you okay with this?

@jeremiedbb jeremiedbb merged commit af5b6a1 into scikit-learn:main May 2, 2022
@jeremiedbb
Copy link
Member

Thanks @Micky774 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OPTICS should support sparse matrices and precomputed distances
5 participants