-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Make automatic validation for all scikit-learn public functions #24862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi can I work on sklearn.cluster.spectral_clustering |
Hi ! Can I try to do the sklearn.cluster.affinity_propagation function ? |
Yes sure, no need to ask. Just commenting that you are working on it is enough :) |
Hi. I can work on sklearn.cluster.mean_shift |
I can work on |
hi! I am working on ward_tree |
|
|
@glemaitre I think |
Working on |
WIP |
Working on |
since |
@emirkmo no we don't need to validate separately, if the underlying class already does it. |
Then |
Working on |
Where can I get details about |
Look at any previously merged PR from this thread. They will give a good overview of what is required. |
Actually I was looking for what parameters and values to put in the dictionary |
Hi, |
Working on |
@jeremiedbb It seems like |
Hi @glemaitre , I can see that validation parameters in the files of the ensemble functions have been added. However, the list in this file does not contain any ensemble function. Kindly correct me if I am wrong. scikit-learn/sklearn/tests/test_public_functions.py Lines 186 to 189 in fa87f28
|
is the utils.extmath.weighted_mode validation done? @glemaitre ?? |
Is there anything left to be validated? I checked everything and didn't find any to work on. I am new to this so I am not sure if I ACTUALLY checked everything + a few of them have open PRs from a while ago I suppose, am I right? Can we work on those? |
@lpsilvestrin |
I updated the list of functions that are not valiated yet #24862 (comment). All of them have open PRs except |
Hi @jeremiedbb , Hope you are doing well. I am interested in this pull request. May I proceed to work on this? |
Sure go ahead ! |
Hi @jeremiedbb , I can see that validation has already been added as you can see below. Can you please guide me if I misunderstood your former comment? scikit-learn/sklearn/neighbors/_graph.py Lines 46 to 58 in 989d299
|
The function to validate is |
Hi @lpsilvestrin , can i proceed with |
Hi @Snerdify we've decided not to add validation for this function. All the remaining functions to do already have open PRs. |
To be more explicit I will also lock the issue for new comment but let it open until we merged the remaining PRs. |
The last PRs have been merged. We can now close this issue. Thanks a lot everyone ! |
PR #22722 introduced a decorator to validate the parameters of functions. We now need to use it for all functions where it is applicable.
Please open one PR per function. The title of the PR must mention which function it's dealing with. We recommend using the following pattern for titles:
where
<function>
is a placeholder to be replaced with the function you chose.The description of the PR must begin with
Towards #24862
so that this issue and the PR are mutually crossed-linked.Steps
Chose a public function that is documented in https://scikit-learn.org/dev/modules/classes.html. Check in the source code if the function contains some manual parameter validation (i.e. you should see some
if
condition and error raising). In case there is no validation in the function, you can report it in the issue where we will decide whether or not to skip the function.To validate the function, you need to decorate it with the decorator
sklearn.utils._param_validation.validate_params
. Do not rely only on the docstring of the estimator to define it: although it can help, it's important to primarily rely on the implementation to find the valid values because the docstring might not be completely accurate. The decorator take a Python dictionary as input where each key corresponds to a parameter name and the value corresponds to the associate constraints. You can find an example forkmeans_plusplus
belowscikit-learn/sklearn/cluster/_kmeans.py
Lines 63 to 74 in 2e481f1
_parameter_constraints
attribute).All existing simple param validation can now be removed. (simple means that does not depend on the input data or that does not depend on the value of another parameter for instance).
Tests that check error messages from simple param validation can also be removed (carefully: we need to keep the tests checking for more complex param validation !).
Finally, add the function to the list of the common param validation test
scikit-learn/sklearn/tests/test_public_functions.py
Lines 11 to 13 in 2e481f1
and make sure the test passes:
pytest -vl sklearn/tests/test_public_functions.py
Functions already updated:
See "details" in section 1
Be aware that you can see an up-to-date list at the following link: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tests/test_public_functions.py#L132
The text was updated successfully, but these errors were encountered: