-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Run common checks on estimators with non default parameters #17441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@amueller Investigating some of these failures, and likely fixing the corresponding estimators (or common tests) could also be a nice project for MLH (or sprints). To run common checks for a particular estimator for all detected parameters,
For instance |
Wow! Though yes, trying meta-estimators with different types of
underlying estimator per #11324 would be especially/additionally valuable.
|
This is really cool. If we want to run this in CI, we would need to some way to skip some combinations because of invalid inputs. This can be involved for LogisticRegression (penalty x solver valid combinations). |
Yes, good point. I have now skipped the checks if Iris couldn't be fitted with the combination of parameters, as that likely means that they are incompatible. Now there are 22554 tests and "only" 502 failures.
Indeed. Actually so far this only runs tests for classifiers and regressors. I'll add other type of non-meta estimators shortly. For meta estimators yeah, that would be interesting, but probably a whole new project, as we can't realistically run all estimators for each combination of parameters for each meta-estimator. That could actually be a good use case for hypothesis (#13846) to randomly sample in the parameter space we want to check. |
categorical_params = [ | ||
"solver", | ||
"algorithm", | ||
"loss", | ||
"strategy", | ||
"selection", | ||
"criterion", | ||
"multi_class", | ||
"kernel", | ||
] | ||
bool_params = [ | ||
"fit_prior", | ||
"fit_intercept", | ||
"positive", | ||
"normalize", | ||
"dual", | ||
"average", | ||
"shuffle", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using type hints and getting these from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could probably try to auto-detect this in the future, but this PR has already a lot of moving parts, and I prefer to manually list parameters to check for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two aspects to my comment which I kinda failed to better explain:
- it's nice to get the info from type hints cause it's nice ;)
- it allows better extending this functionality to third party estimators.
My personal motivation behind my comment was the second one. I'm happy to leave these hard coded here instead of adding type hints in our estimators in this PR, but it would be really nice if this PR also introduces detecting the type from the type hints, unless you think it'd make this PR too big/complicated to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked into auto-detecting it at least for boolean, but then you endup with a long list of parameters to exclude, e.g.
['verbose', 'copy', 'copy_X', 'warm_start', 'compute_score', 'compute_labels', 'validate', 'accept_sparse', 'check_inverse', 'copy_X_train', 'add_indicator', 'skip_complete', 'fit_path']
and I have only done estimators starting between A to D. Each of those shouldn't impact the model results but would add to the total runtime. So I think it's easier to whitelist parameter names rather than blacklist them in this case. It could be still possible to use this in contrib projects, by passing the whitelist as input parameter to detect_all_params
function (or just monkeypatching the categorical_params
/ bool_params
lists for now).
Added checks for clusterers and transformers, now there are 58.6k added tests with,
|
The idea with meta-estimators would be to check with base estimators that
have a range of different properties / APIs, not to run it against all base
estimators.
|
return False | ||
|
||
|
||
def detect_valid_categorical_params( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to do this by introducing type hints for categorical params? Or by parsing the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For type annotations we would need PEP 586 for literal types and backport it for Python <3.8. That's probably indeed the best solution long term.
I'm not very keen on parsing docstrings, as that can break easily and is less reliable than the current apporach. For instance, estimators taking metrics names as input, all valid names are not always listed, sometimes it's just a link to the user guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the typing-extensions
package is an official source of backports too.
And Hypothesis' introspection can inspect typing
or typing_extensions.Literal[...]
and automatically sample from the parameters 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are working toward having typing in #17799
Moving forward we are going to run into invalid parameter combinations when sampling from the parameter space. Libraries such as ConfigSpace uses a combination of conditional clauses and forbidden clauses to help define the relationships between parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the references! I think using hypothesis to sample this parameter space long term could be intersting, as it could also generalize to float parameters. Also to limit the runtime somewhat once the exhaustive combination of parameters becomes untracktable.
This adds a mechanism for running common checks on estimators with non default parameters. It works as follows,
"solver", "algorithm", "loss", "strategy", "selection", "criterion", "multi_class", "kernel"
"fit_prior", "fit_intercept", "positive" "normalize", "dual", "average", "shuffle"
.The determination for valid parameters for an estimator is bit of a hack but it works in most cases (with a few hard-coded special cases),
The list of detected parameters can be obtained with
This generates 26424 additional tests (the scikit-learn test suite has 16052 tests) which runs in 11 min. The run time can likely be improved by using pytest-xdist on multi-core CPUs .
I think passing these checks would both help detecting new bugs and make our common checks more robust.
This is a somewhat orthogonal solution to #11324 although they both address the same problem.
There are 2858 errors / 26424. I haven't checked the errors in detail and some might be due to parameter detection failure (in particular some parameters might be incompatible). The logs (without tracebacks) can be found in pytest.log.
cc @NicolasHug @thomasjpfan @jeremiedbb @jnothman
TODO: