-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] ENH Disassemble check estimator #14381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] ENH Disassemble check estimator #14381
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please add a what's new entry as well.
sklearn/utils/estimator_checks.py
Outdated
generate_only : bool, optional (default=True) | ||
When `True`, checks are evaluated when `check_estimator` is called. | ||
When `False`, `check_estimator` generates the checks and the estimator. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add versionadded
@@ -98,17 +100,15 @@ def _rename_partial(val): | |||
|
|||
@pytest.mark.parametrize( | |||
"estimator, check", | |||
_generate_checks_per_estimator(_yield_all_checks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this function now?
sklearn/utils/estimator_checks.py
Outdated
@pytest.mark.parameterize( | ||
'estimator, check', | ||
chain.from_iterable(check_estimator(est, generate_only=False) | ||
for est in estimators)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issues with this is that because estimator
is an instance and not a class it's not going to be rendered nicely in the list of tests. That's why we use id=_rename_partial
. I guess there is no easy way of fixing this for users...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without our _rename_partial
, pytest names everything estimator213-check41
. It could be slightly better if we yield the name as well from check_estimator
, which will result in ARDRegression-estimator211-check21
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should really try to have the tests have sensible names. This is super useless otherwise I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not fix the fixme in _rename_partial
and move it to estimator_checks
and make it public and use it here? That solves the problem, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print_changed_only
adds newlines sometimes, which leads to pytest names with newlines in them.
Making _rename_partial
is a good idea, it would need another name, maybe nicer_check_estimator_id
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using print_changed_only
and remove the newlines? Also: if you're testing stuff that's that deeply nested there's no way to give it a good name.
We can not use This PR was updated to return a list of checks, rather than a generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thomasjpfan !
sklearn/utils/estimator_checks.py
Outdated
|
||
from itertools import chain | ||
|
||
@pytest.mark.parameterize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parameterize( | |
@pytest.mark.parametrize( |
looks good a part from typos and setting the ids. |
sklearn/utils/estimator_checks.py
Outdated
@@ -265,7 +265,31 @@ def _yield_all_checks(name, estimator): | |||
yield check_fit_idempotent | |||
|
|||
|
|||
def check_estimator(Estimator): | |||
def readable_check_estimator_ids(val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just set_check_estimator_ids
or make_check_estimator_ids
?
sklearn/utils/estimator_checks.py
Outdated
@@ -265,7 +265,31 @@ def _yield_all_checks(name, estimator): | |||
yield check_fit_idempotent | |||
|
|||
|
|||
def check_estimator(Estimator): | |||
def readable_check_estimator_ids(val): | |||
"""Create readable pytest ids for `check_estimator` when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also used internally without reference to check_estimator
.
How about
"""Create pytest ids for checks.
Returns string representations for pytest tests, to be used as id.
Use together with ``check_estimator(..., generate_only=True)``
"""
Though that makes documenting val
possibly more difficult. But on the other hand you could just directly document what val
needs to be instead of defining it implicitly.
Also: missing Returns
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you actually tested the code that you recommend others will use ;)
sklearn/utils/estimator_checks.py
Outdated
|
||
from itertools import chain | ||
import pytest | ||
from sklearn.utils.estimator_check import check_estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from sklearn.utils.estimator_check import check_estimator | |
from sklearn.utils.estimator_checks import check_estimator |
sklearn/utils/estimator_checks.py
Outdated
from itertools import chain | ||
import pytest | ||
from sklearn.utils.estimator_check import check_estimator | ||
from sklearn.utils.estimator_check import readable_check_estimator_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from sklearn.utils.estimator_check import readable_check_estimator_ids | |
from sklearn.utils.estimator_checks import readable_check_estimator_ids |
pass | ||
|
||
|
||
@pytest.mark.parametrize("val, expected", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to contradict the statement below ;)
Looks good apart from the pytest soft dependency issue and the name of the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of adding this syntactic sugar?
def parametrize_with_checks(estimators):
if hasattr(estimators, 'fit'):
estimators = [estimators]
return pytest.mark.parametrize(
['estimator', 'check'],
check_estimator(estimator, generate_only=True),
ids=set_check_estimator_ids)
Then
from sklearn.utils.estimator_checks import parametrize_with_checks
from sklearn.linear_model import LogisticRegression
@parametrize_with_checks(LogisticRegression)
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)
Then we could make set_check_estimator_ids
private.
+1 though maybe with a |
Ahh yes, forgot that part from my draft implementation!
|
Having this interface actually means that we probably don't need the
awkward thing where check_estimator generates the estimator as well as the
checks.
|
For instance to programmatically generate a list of checks for a given application, manually skip a few selected ones and run the rest with |
My comment was about whether it should generate (estimator, check) pairs or
just check.
|
Currently we still need the estimator passed back because of how we do |
I don't actually think we mean the same thing... But yes, it's easier to have check_estimator generate pairs also to be unambiguous in the case of classes vs instances, etc. |
Overall I like having the syntactic sugar, but the hard dependency on
@jnothman Can you expand on this? |
It's not a hard dependency, it's a soft dependency that is required for
those who want to use this helper.
If check_estimator generates check functions only you can still construct a
generator within parametrize_with_checks that parameterizes the tests with
(estimator, check) pairs.
Can give an implementation example another time if you wish.
|
Thank you for the clarification! As you said, the concern with only generating the checks is that it becomes harder to distinguish the class checks and the instance checks. I will move forward with the |
I would assume that the target audience here will be happy to work out the
naming pattern, but sure!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to mention the new decorator in the developer guide.
@rth let's merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @thomasjpfan !
Yay! Well done everyone!
|
Awesome! |
Reference Issues/PRs
Fixes #11622
Alternative to #13843
What does this implement/fix? Explain your changes.
Libraries will be able to write the following to run their tests:
Any other comments?
The
name
argument of the checks are set bycheck_estimator
.