Skip to content

FIX LabelPropagation handling of sparce matrices #17085 #17384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

goerch
Copy link

@goerch goerch commented May 29, 2020

Reference Issues/PRs

Fixes #17085

What does this implement/fix? Explain your changes.

Label propagation and spreading allow to classify using sparse data according to documentation. Tests only covered the dense case. Newly added coverage for sparse matrices allows to reproduce the problem in #17085. The proposed fix in #17085 works for the extended tests.

Any other comments?

  • Supporting scipy's dok_matrix produces the UserWarning "Can't check dok sparse matrix for nan or inf.". So this format seems to be unsuitable?
  • test_label_propagation_closed_form fails for sparse matrices

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @goerch . This also needs a regression test specific to the issue it fixes.

@@ -190,7 +190,7 @@ class labels.
"""
check_is_fitted(self)

X_2d = check_array(X, accept_sparse=['csc', 'csr', 'coo', 'dok',
X_2d = check_array(X, accept_sparse=['csc', 'csr', 'coo',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this use _validate_data? ping @NicolasHug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also really not sure why this fixes the issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this use _validate_data

We only use it in fit for now, so check_array is fine here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @adrinjalali: PR #17085 already contains tests extended to the sparse case. Could you please advise a newbie regarding regression tests?

clf = label_propagation.LabelSpreading(max_iter=10000, alpha=alpha)
clf.fit(X, y)
assert_array_almost_equal(expected, clf.label_distributions_, 4)
for sparse_or_dense in SPARSE_OR_DENSE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please parameterize the test for this one and alpha? Also for the other tests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali: I didn't write the original tests, so I'm probably not the right one to ask for improving these with regard to parametrization.

@adrinjalali adrinjalali changed the title Test and fix for issue #17085 FIX LabelPropagation handling of sparce matrices #17085 Jul 14, 2020
Base automatically changed from master to main January 22, 2021 10:52
@cozek
Copy link
Contributor

cozek commented Mar 11, 2021

I am taking over this issue in #19664

@cmarmo cmarmo added Superseded PR has been replace by a newer PR and removed help wanted labels Mar 12, 2021
@cmarmo
Copy link
Contributor

cmarmo commented May 9, 2022

I am closing this pull request as superseded by #19664.

@cmarmo cmarmo closed this May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LabelPropagation raises TypeError: A sparse matrix was passed
5 participants