Skip to content

FIX ‘sparse’ kwarg was not used by fowlkes_mallows_score #28981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 23, 2025

Conversation

cynddl
Copy link
Contributor

@cynddl cynddl commented May 8, 2024

Reference Issues/PRs

No issue.

What does this implement/fix? Explain your changes.

The function sklearn.metrics.fowlkes_mallows_score was defined with:

def fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False):

but the code never uses sparse. The parameter sparse should have been passed to contingency_matrix, but instead we currently see a few lines later:

    contingency_matrix(labels_true, labels_pred, sparse=True)

This commit fixes the inconsistency by:

  • changing the default to sparse=True so that it doesn't break any test or downstream use
  • making sure that contingency_matrix uses the given sparse parameter.

Any other comments?

The function sklearn.metrics.fowlkes_mallows_score has never been tested with sparse=False, it might be good to have a few tests for that use case too?

@cynddl cynddl changed the title Sparse kwarg was not used by fowlkes_mallows_score FIX ‘sparse’ kwarg was not used by fowlkes_mallows_score May 8, 2024
Copy link

github-actions bot commented May 8, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 9b0be6c. Link to the linter CI: here

Copy link

@prabashbala prabashbala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to add a test for this please.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cynddl . Could you please add a test as well?

@@ -1181,7 +1181,7 @@ def normalized_mutual_info_score(
},
prefer_skip_nested_validation=True,
)
def fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False):
def fowlkes_mallows_score(labels_true, labels_pred, *, sparse=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change the default here though.

@dschult
Copy link
Contributor

dschult commented Mar 25, 2025

Setting sparse=False after this PR would break the algorithm. It is set up for computing with a sparse contingency matrix.

Instead of "making this keyword argument work" I think it makes sense to "remove his keyword argument". It is not used, has not been used and provides no advantages.

@adrinjalali
Copy link
Member

adrinjalali commented Apr 2, 2025

That's true, the code actually relies on the output (c) being sparse. So we can deprecate the parameter here.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the requested changes to deprecate the param instead.

LGTM.

@adrinjalali adrinjalali merged commit 7cc6032 into scikit-learn:main Apr 23, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants