Skip to content

DOC correct input array type in MeanShift.predict #20796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 23, 2021

Conversation

milana2
Copy link
Contributor

@milana2 milana2 commented Aug 21, 2021

Reference Issues/PRs

Fixes Issue #20733. See also Issue #20049 and PR #20117.

What does this implement/fix? Explain your changes.

A minor change of the call to validate_data to allow MeanShift.predict to work with sparse input.

Any other comments?

Apologies for mistakes, this is my first PR. Thanks!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @milana2 !

The linting test is failing. To fix it you can run pip install black==21.6b0 and run black .

We also need a new test in sklearn/cluster/tests/test_mean_shift.py to make sure that sparse matrices now work.

@thomasjpfan thomasjpfan changed the title Allowing sparse input in MeanShift.predict. FIX Allowing sparse input in MeanShift.predict. Aug 21, 2021
@milana2
Copy link
Contributor Author

milana2 commented Aug 21, 2021

Thank you @thomasjpfan! I think I have fixed the PR.

One observation is that MeanShift.fit_predict(X) won't work with sparse input. It defaults to the superclass fit_predict then yo-yos back down to MeanShift.fit(X), which doesn't work with sparse input. I'm not sure if this is an issue.

@thomasjpfan
Copy link
Member

I'm not sure if this is an issue.

It can be an issue. fit does not support sparse matrices, so having predict support sparse matrices is a bit inconsistent.

I am okay with predict supporting sparse matrices while fit does not. I think another maintainers opinion would help see if we want this change.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow up! Some more comments on the code.

milana2 and others added 2 commits August 21, 2021 16:37
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@milana2
Copy link
Contributor Author

milana2 commented Aug 21, 2021

Thank you!

If the team decides against the change then we need to update the documentation. Currently the doc says MeanShift.predict accepts X : {array-like, sparse matrix}.

@glemaitre
Copy link
Member

I don't really like the inconsistency. It is unlikely to fit a dense X and predict a sparse X with such clustering.
I would prefer to update the documentation stating that we only support array-like.

@glemaitre
Copy link
Member

@milana2 Could you revert the changes and just update the documentation.

@thomasjpfan
Copy link
Member

I don't really like the inconsistency. It is unlikely to fit a dense X and predict a sparse X with such clustering. I would prefer to update the documentation stating that we only support array-like.

I am okay with this option. @milana2 Did you have a use case, where you fit on a dense X and predict on a sparse X?

@milana2
Copy link
Contributor Author

milana2 commented Sep 13, 2021

@milana2 Could you revert the changes and just update the documentation.

Will do.

@milana2 Did you have a use case, where you fit on a dense X and predict on a sparse X?

We don't have a use case, no. (While working on a project, we noticed inconsistencies in the handling of sparse data and we ran an analysis to look for inconsistencies between code and documentation. We have reported two of the cases we found. There are others. I can open an Issue with the full list if you think that would be useful?)

@glemaitre glemaitre changed the title FIX Allowing sparse input in MeanShift.predict. DOC correct input array type in MeanShift.predict Sep 23, 2021
@glemaitre glemaitre merged commit 98ef23d into scikit-learn:main Sep 23, 2021
@glemaitre
Copy link
Member

Thanks @milana2 Looks good.

@glemaitre glemaitre mentioned this pull request Oct 23, 2021
10 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 23, 2021
Co-authored-by: Ana Milanova <milana2@rpi.edu>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
glemaitre pushed a commit that referenced this pull request Oct 25, 2021
Co-authored-by: Ana Milanova <milana2@rpi.edu>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Co-authored-by: Ana Milanova <milana2@rpi.edu>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants