Skip to content

[WIP] API specify test parameters via classmethod #11324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jun 20, 2018

This is an initial implementation of what I suggested in #8022 (comment) (ping @amueller).

The idea is to put the test configuration for an estimator class on the class. This provides:

  • advantage: can instantiate meta-estimators which have required args
  • advantage: can see the test parametrisation clearly on the class
  • disadvantage: code limiting max_iter, etc., is repetitive (this PR adds lines and decentralises functionality)
  • disadvantage: harder for reviewers to point out to a contributor that these changes are needed

The key changes are to base.py, test_common.py and estimator_checks.py

TODO:

  • documentation
  • use _generate_test_params when calling check_estimator on a class
  • replace check_parameters_default_constructible
  • work out how to still do check_no_attributes_set_in_init
  • test exception when running check_estimator on class requiring parameters

@jnothman jnothman added the API label Jun 20, 2018
@@ -319,61 +319,6 @@ def _boston_subset(n_samples=200):
return BOSTON


def set_checking_parameters(estimator):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we need to leave this in for backwards compat?

@jnothman jnothman changed the title API specify test parameters via classmethod [MRG] API specify test parameters via classmethod Jun 20, 2018
@amueller
Copy link
Member

yes, this looks like a good idea. I might try to get the tags done first, though? Will be next week, though :-/

@jnothman
Copy link
Member Author

I think this is more-or-less orthogonal to tags.

@jnothman
Copy link
Member Author

Though it will simplify your tags implementation for meta-estimators, for instance.

@amueller
Copy link
Member

more or less, yes ;) might be there's no actual issues here. You mean by providing ways to instantiate them? Well it will make it easier to test more classes which will mean there's be many more failures lol ;)

@jnothman
Copy link
Member Author

jnothman commented Jun 20, 2018 via email

@amueller
Copy link
Member

So you wouldn't test that all parameters are either specified by default or explicitly defined as required? We could easily miss "max_iter" not having a default value with this. Maybe not as big an issue for us but for third parties?

@@ -594,6 +602,10 @@ def __init__(self, C=1.0, kernel='rbf', degree=3, gamma='auto_deprecated',
decision_function_shape=decision_function_shape,
random_state=random_state)

@classmethod
def _get_test_instances(cls):
yield cls(decision_function_shape='ovo')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather: why doesn't this fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current set_checking_parameters does this for *SVC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. But why? I'm sure that was me. I should get some coffee and think about that. That makes the shape of the decision function wrong, doesn't it? Are we never testing for more than 3 classes? I'm quite confused.

@amueller
Copy link
Member

Did you set parameters at everything that was impacted by the old function? Or how did you determine when to set parameters? Maybe it's easier to merge this first, either way I'm unlikely to have a lot of bandwidth before next week :-/

@jnothman
Copy link
Member Author

I tried to do every edit that corresponded to the existing set_checking_parameters. I've deprecated that function now, but left the portable logic in, on the basis that some implementers may have used it.

@amueller
Copy link
Member

I'd really like to hear your opinion on the default parameter tests. It seems the tag for tag was confusing to you and @rth. But I feel it's important to make sure that third party estimators don't have weird required parameters.

@amueller
Copy link
Member

And I think leaving the function in but deprecating doesn't really cost us much so I'd do it.

@jnothman
Copy link
Member Author

jnothman commented Jun 20, 2018 via email

@jnothman
Copy link
Member Author

And I think leaving the function in but deprecating doesn't really cost us much so I'd do it.

Done. See https://github.com/scikit-learn/scikit-learn/pull/11324/files#diff-a95fe0e40350c536a5e303e87ac979c4R323

@jnothman
Copy link
Member Author

I think you'll like the last commit.

@amueller
Copy link
Member

The point is that this will also raise an error if they try run check_estimator on the estimator unless they define params. I think this will help us clearly review any testing params required for contrib projects.

Yeah but it requires a review, and check_estimator has wider use than contrib projects. It leaves open the possibility for someone accidentally forgetting so specify a default for a parameter that they set in testing. But maybe I'm being paranoid.

It could also be that if someone doesn't know our conventions they'll just add them to the testing function and not to the constructor. But maybe the docs can clarify that.

@amueller
Copy link
Member

Ok that seems a good solution ;) you pre-empted my complaints.
Needs dev docs, otherwise looks good (though I didn't go through for all the cases).
Let's to his before tags then.

@jnothman jnothman changed the title [MRG] API specify test parameters via classmethod [WIP] API specify test parameters via classmethod Jun 20, 2018
@jnothman
Copy link
Member Author

jnothman commented Jun 20, 2018 via email

@jnothman
Copy link
Member Author

The trouble with that is supporting check_no_attributes_set_in_init. Also, this PR should aim to replace all of check_parameters_default_constructible

@jnothman
Copy link
Member Author

The easiest way I can think of doing check_no_attributes_set_in_init is to make _get_test_instances return a dict of params, rather than an instance, and the construction to happen in test_common/check_estimator.

@amueller
Copy link
Member

Hm there is tests that test init stuff that are not covered in clone, I think.
I need to double check.

Also from a use perspective is seems easier to pass an instance than add to this function in the top esitmator. Would we want all possibly pipelines we want to test be hardcoded in the pipeline class? That seems weird to me, in particular if I'm testing things like NaN handling.

It would be nice for a third party to be able to check whether their pipeline construct works nicely without inheriting / monkey-patching pipeline.

@jnothman
Copy link
Member Author

Maybe I'm sold on the requires_params tag. I tried changing check_parameters_default_constructible to accept some params and make sure that either these or the defaults are set.... but maybe that's silly.

@amueller
Copy link
Member

amueller commented Jun 20, 2018

I'm not adamant about my solution at all, will think about your idea later (have to run).
I thought your error message was good but if it's hard to implement thats another thing. I think it might be reasonable to think that people don't start with implementing _generate_test_params so the test will be meaningful in most cases.

Currently the distinction between class level tests and instance level tests is pretty ugly. I was not happy with that at my last refactoring. But I didn't see a good way to check default_constructible but also pass in complex meta-estimators that might not "belong" to a specific class.

@amueller
Copy link
Member

Do we want to have this in the release? I was kinda hoping for estimator tags, not sure if I have time to finish it, though...

@jnothman
Copy link
Member Author

It would be nice to have all these things... I'm not sure we can justify a delay to release, though!

@NicolasHug
Copy link
Member

@jnothman do you mind if I try solving the merge conflicts so we can move this forward?

@amueller
Copy link
Member

@NicolasHug I'm not sure there's consensus on this.

I think I'd rather to the meta-estimator initialization by using instance-based testing as in #9741 instead of class-based testing.

The main downside of that is that some estimators (like SelectKBest and GaussianRandomProjection) don't pass the common tests with their default parameters, while other estimators take "a long time" with their default parameters.

Having a decentralized architecture is better than the centralized architecture we have right now, but in the end I feel having "testing parameters" is a bit like playing vw because we're not actually testing the defaults.
I'm not sure if it is worth changing the defaults so they work, but that also seems a bit arbitrary. Or change the tests so they work?

@amueller
Copy link
Member

Maybe we need a slep / discussion on if/how to remove set_checking_parameters.

@rth
Copy link
Member

rth commented Jul 31, 2019

I think I'd rather to the meta-estimator initialization by using instance-based testing as in #9741 instead of class-based testing.

@amueller Could you elaborate on that? Say if I want to run common tests on an estimator with different solvers (#14063), I see how it can be done with this PR but not with instance based approaches (short of manually creating appropriate tests).

@jnothman
Copy link
Member Author

jnothman commented Jul 31, 2019 via email

@rth
Copy link
Member

rth commented Jul 31, 2019

this is more or less equivalent to just
calling check_estimator on a bunch of instances of the estimator, with
different solvers for example, and doing so in the estimator's test module.

I agree that would be the most straightforward approach. Is there a reason we haven't been doing that so far? There shouldn't be issues for non meta-estimator I think?

@amueller
Copy link
Member

@rth I think the reason we haven't done it is that the last time I was working on this the estimator tags were not merge and so it was not easily possible.

Doing instance-level checks is my preferred solution to checking different code paths. However, it doesn't get rid of set_checking_parameters. I think it's still good to run everything through the common tests so we don't for get anything. But the problem is that some estimators are slow by default and some estimators don't pass the tests by default. Neither of these issues is solved by explicit instance-based testing.

@jnothman
Copy link
Member Author

jnothman commented Aug 1, 2019 via email

@amueller
Copy link
Member

amueller commented Aug 2, 2019

That would still allow a lot of hackery, though. I feel like ideally we shouldn't need to tweak the estimators to pass the tests.

@jnothman
Copy link
Member Author

jnothman commented Aug 3, 2019 via email

@adrinjalali
Copy link
Member

That would still allow a lot of hackery, though. I feel like ideally we shouldn't need to tweak the estimators to pass the tests.

One way to fix the issue is to have more "auto" values for the params. The estimators would ideally set them properly when a small dataset is passed.

@amueller
Copy link
Member

amueller commented Aug 12, 2019

@adrinjalali that's only sort-of true. For example if we're checking API we don't need algorithms to convert, so we set max_iter or n_iter to something small.

@jnothman I wasn't necessarily taking about tags. My issue that the point of the tests is to make sure the estimators behave correctly. Any mechanism that allows us to set max_iter to make the algorithm run faster also allows us to hide issues with the default parameters or work around bugs, i.e. cheat the tests.
From that perspective, this PR basically implements the volkswagen method.

@jnothman
Copy link
Member Author

jnothman commented Aug 12, 2019 via email

@amueller
Copy link
Member

And the fast runs use the method in this PR?

In principle that seems like a reasonable solution. Running with default params on the cron job and with "fast" params on PRs?

That would not resolve the issues with weird defaults (like SelectKBest) but we that part could be resolved by being "smart" with "auto"?

Does that mean check_estimator gets a use_testing_parameters argument that propagates to everything, together with the strict_mode parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants