Skip to content

MAINT Parameters validation for cluster.estimate_bandwidth #24869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Towards #24862

What does this implement/fix? Explain your changes.

  • Added the validate_params decorator to cluster.estimate_bandwidth

Any other comments?

@glemaitre
Copy link
Member

For the moment, we don't validate parameters. However, we only call once estimate_bandwidth and it is quite time-consuming in regards to the validation parameters time. I think this is worth adding a parameter validation, knowing that we don't validate parameters in MeanShift because we are using the defaults.

"random_state": ["random_state"],
"n_jobs": [Integral, None],
}
)
def estimate_bandwidth(X, *, quantile=0.3, n_samples=None, random_state=0, n_jobs=None):
"""Estimate the bandwidth to use with the mean-shift algorithm.

Copy link
Member

@glemaitre glemaitre Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should update the comment below:

This function takes time at least quadratic in `n_samples`. For large
datasets, it is wise to subsample by setting `n_samples`. Alternatively,
the parameter `bandwidth` can be set to a small value without estimating
it.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove the test sklearn.cluster.tests.test_estimate_bandwidth_with_sparse_matrix since they are covered by the common test.

@jeremiedbb jeremiedbb added the Validation related to input validation label Nov 18, 2022
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @OmarManzoor

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @OmarManzoor

@jeremiedbb jeremiedbb merged commit 2d475c0 into scikit-learn:main Nov 30, 2022
@OmarManzoor OmarManzoor deleted the validate_params_for_estimate_bandwidth branch November 30, 2022 13:56
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants