-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] API specify test parameters via classmethod #11324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -319,61 +319,6 @@ def _boston_subset(n_samples=200): | |||
return BOSTON | |||
|
|||
|
|||
def set_checking_parameters(estimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we need to leave this in for backwards compat?
yes, this looks like a good idea. I might try to get the tags done first, though? Will be next week, though :-/ |
I think this is more-or-less orthogonal to tags. |
Though it will simplify your tags implementation for meta-estimators, for instance. |
more or less, yes ;) might be there's no actual issues here. You mean by providing ways to instantiate them? Well it will make it easier to test more classes which will mean there's be many more failures lol ;) |
True! But it also means you might not need to have a required_parameters
tag.
|
So you wouldn't test that all parameters are either specified by default or explicitly defined as required? We could easily miss "max_iter" not having a default value with this. Maybe not as big an issue for us but for third parties? |
sklearn/svm/classes.py
Outdated
@@ -594,6 +602,10 @@ def __init__(self, C=1.0, kernel='rbf', degree=3, gamma='auto_deprecated', | |||
decision_function_shape=decision_function_shape, | |||
random_state=random_state) | |||
|
|||
@classmethod | |||
def _get_test_instances(cls): | |||
yield cls(decision_function_shape='ovo') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather: why doesn't this fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current set_checking_parameters
does this for *SVC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. But why? I'm sure that was me. I should get some coffee and think about that. That makes the shape of the decision function wrong, doesn't it? Are we never testing for more than 3 classes? I'm quite confused.
Did you set parameters at everything that was impacted by the old function? Or how did you determine when to set parameters? Maybe it's easier to merge this first, either way I'm unlikely to have a lot of bandwidth before next week :-/ |
I tried to do every edit that corresponded to the existing |
I'd really like to hear your opinion on the default parameter tests. It seems the tag for tag was confusing to you and @rth. But I feel it's important to make sure that third party estimators don't have weird required parameters. |
And I think leaving the function in but deprecating doesn't really cost us much so I'd do it. |
The point is that this will also raise an error if they try run
check_estimator on the estimator unless they define params.
I think this will help us clearly review any testing params required for
contrib projects.
|
|
I think you'll like the last commit. |
Yeah but it requires a review, and check_estimator has wider use than contrib projects. It leaves open the possibility for someone accidentally forgetting so specify a default for a parameter that they set in testing. But maybe I'm being paranoid. It could also be that if someone doesn't know our conventions they'll just add them to the testing function and not to the constructor. But maybe the docs can clarify that. |
Ok that seems a good solution ;) you pre-empted my complaints. |
It's not actually a sufficient solution. Most people won't use our
test_common.py which is the only place _get_test_instances is called. I've
left myself a todo to use _get_test_instances when calling
check_estimator(cls)
We could also (now that it's only recently been introduced) consider
deprecating calling check_estimator on an instance... but we don't need to
worry about that here.
|
The trouble with that is supporting |
The easiest way I can think of doing |
Hm there is tests that test init stuff that are not covered in clone, I think. Also from a use perspective is seems easier to pass an instance than add to this function in the top esitmator. Would we want all possibly pipelines we want to test be hardcoded in the pipeline class? That seems weird to me, in particular if I'm testing things like NaN handling. It would be nice for a third party to be able to check whether their pipeline construct works nicely without inheriting / monkey-patching pipeline. |
Maybe I'm sold on the |
I'm not adamant about my solution at all, will think about your idea later (have to run). Currently the distinction between class level tests and instance level tests is pretty ugly. I was not happy with that at my last refactoring. But I didn't see a good way to check |
Do we want to have this in the release? I was kinda hoping for estimator tags, not sure if I have time to finish it, though... |
It would be nice to have all these things... I'm not sure we can justify a delay to release, though! |
@jnothman do you mind if I try solving the merge conflicts so we can move this forward? |
@NicolasHug I'm not sure there's consensus on this. I think I'd rather to the meta-estimator initialization by using instance-based testing as in #9741 instead of class-based testing. The main downside of that is that some estimators (like Having a decentralized architecture is better than the centralized architecture we have right now, but in the end I feel having "testing parameters" is a bit like playing vw because we're not actually testing the defaults. |
Maybe we need a slep / discussion on if/how to remove |
@amueller Could you elaborate on that? Say if I want to run common tests on an estimator with different solvers (#14063), I see how it can be done with this PR but not with instance based approaches (short of manually creating appropriate tests). |
I think @amueller is saying that this is more or less equivalent to just
calling check_estimator on a bunch of instances of the estimator, with
different solvers for example, and doing so in the estimator's test module.
|
I agree that would be the most straightforward approach. Is there a reason we haven't been doing that so far? There shouldn't be issues for non meta-estimator I think? |
@rth I think the reason we haven't done it is that the last time I was working on this the estimator tags were not merge and so it was not easily possible. Doing instance-level checks is my preferred solution to checking different code paths. However, it doesn't get rid of |
One alternative to running common tests is for check_estimator to log which
estimator classes had been run through it, and for us to check at the end
of testing that all sklearn classes are on that log...
… |
That would still allow a lot of hackery, though. I feel like ideally we shouldn't need to tweak the estimators to pass the tests. |
Yes... The tags should help adapt to the tests. But some of the checking
params are there just to make the tests faster...
|
One way to fix the issue is to have more "auto" values for the params. The estimators would ideally set them properly when a small dataset is passed. |
@adrinjalali that's only sort-of true. For example if we're checking API we don't need algorithms to convert, so we set @jnothman I wasn't necessarily taking about tags. My issue that the point of the tests is to make sure the estimators behave correctly. Any mechanism that allows us to set |
So do we need fast and slow runs of the tests?
|
And the fast runs use the method in this PR? In principle that seems like a reasonable solution. Running with default params on the cron job and with "fast" params on PRs? That would not resolve the issues with weird defaults (like SelectKBest) but we that part could be resolved by being "smart" with "auto"? Does that mean |
This is an initial implementation of what I suggested in #8022 (comment) (ping @amueller).
The idea is to put the test configuration for an estimator class on the class. This provides:
max_iter
, etc., is repetitive (this PR adds lines and decentralises functionality)The key changes are to
base.py
,test_common.py
andestimator_checks.py
TODO:
_generate_test_params
when calling check_estimator on a classcheck_parameters_default_constructible
check_no_attributes_set_in_init