-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Adding tests for estimators implementing partial_fit
and a few other related fixes / enhancements
#3907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That seems a nice list (although testing that models are the same currently is done by testing the output of I'm not altogether certain about the |
What is the current semantics? |
7284706
to
8bfbd75
Compare
I am stuck with the 2nd alone... The case when ( assuming that the equality checks for the accuracy ( Kindly review my current implementation of the other tests. |
c3089fd
to
ee13683
Compare
You should probably use predict/transform directly rather than score, as do the existing invariance tests. I think |
You're assuming all def _assert_same_model_method(method, X, estimator1, estimator2, msg):
if hasattr(estimator1, method):
assert hasattr(estimator2, method), '{!r} has transform, but {!r} does not'.format(estimator1, estimator2)
assert_array_almost_equal(getattr(estimator1, method)(X), getattr(estimator2, method)(X), 2, 'When testing {}: {}'.format(method, msg))
def assert_same_model(X, estimator1, estimator2, msg):
_assert_same_model_method('predict')
_assert_same_model_method('transform') |
sklearn/utils/estimator_checks.py
Outdated
@@ -386,6 +389,84 @@ def check_estimators_partial_fit_n_features(name, Alg): | |||
assert_raises(ValueError, alg.partial_fit, X[:, :-1], y) | |||
|
|||
|
|||
def check_estimators_pfit_return(name, Alg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alg
-> Estimator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even I thought so... but this seems to be the convention followed by other checks...
Could I change all the others uniformly to Estimator
too or I'll change this one alone?
Also what does Alg
mean btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Algorithm ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to change everything to Estimator
, I might have picked the name, but don't have strong opinions about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman For your 2nd comment, thanks a lot for the review... I'll make the changes :) |
@amueller I'm not quite sure. I think it should either raise an error or it should overwrite and do a new fit from the beginning, if one calls partial_fit after fit. If partial_fit starts from where fit left, then it does not make a difference whether you fit or partial_fit, right? ;) |
@MechCoder @amueller From what I observed But as you and others suggested, perhaps |
We could either.
I'm inclined towards the second one but I'm not sure what is the best thing to do. |
I feel 2b is already implemented... I guess its between 2a or 1... Anyway thanks for your comments :) |
A code revision related question... When revisions are suggested to the code, am I expected to add a new commit reflecting those revision or do I (soft) reset my previous work, make the changes and recommit? |
I think, you can do it either way. But if those revisions are minor, and if you feel it worthless to add a new commit for those (like pep8, cosmits and stuff), you can just amend your commit and push by force (at least, that's what I do). |
Thanks... I'll also follow the same :) |
562bed4
to
133778a
Compare
The point is that the interaction between |
@jnothman To your earlier comment,
Could you expand a little by what you meant? I understood the part where you said all those estimators should not be tested assuming they are predictors and that we should separately test them based on whether they are a Transformer or Predictor ... But I fail to understand the two code snippets... [ @amueller perhaps such a functionality could be useful for So for estimators that are not Transformers: And for Transformers with Excuse me if I understood your comment incorrectly... |
133778a
to
bcb342d
Compare
partial_fit
partial_fit
and tests for utils.testing.all_estimators
function.
672b06a
to
3baac44
Compare
Yes that will be really better! I'll cherry pick the helpers into a new branch and raise a PR!! |
FIX Return self in partial_fit of BernoulliRBM
de291d8
to
019abda
Compare
* Check if no error is raised when fit is called with a different number of features * Check if estimator is reset when fit is called after a partial_fit * Check if error is raised when no of features changes between different partial_fit calls * Check if partial_fit does not overwrite the previous model * Check if classes argument is parsed and validated correctly * Check if clone(estimator).partial_fit == estimator.partial_fit * Check if partial_fit returns self
019abda
to
5d2a90e
Compare
Not sure what's the state of this PR at the moment, but please consider #5416.
In the case of scalers, the issue came from the fact that we are now implementing Ping @lesteve |
partial_fit
and a few other related fixes / enhancementspartial_fit
and a few other related fixes / enhancements
Is this PR just waiting for review or is there still work to be done? |
This is gonna need a fresh start if we're going to do it. And we'd need to do significant work on the semantics of |
Fixes #3896
New
partial_fit
tests:The below tests are based off @arjoly's suggestion.
partial_fit
returnsself
. - Thanks @jnothman for the suggestion.assert clone(est).partial_fit == est.partial_fit
fit
after a set offit
/partial_fit
restarts the estimator.partial_fit
and thenfit
with a different number of features works without raising any Exceptions... - Thanks @amueller for the suggestionpartial_fit
does not overwrite the previous model.partial_fit
.classes
argument andnp.unique(y_i)
raisesValueError
.classes
is not specified during firstpartial_fit
call.classes
is not specified during subsequent calls._validate_y_against_classes
- Check label mismatch betweeny
andclasses
argassert_same_model(X, est1, est2)
assert_not_same_model(X, est1, est2)
predict
,transform
,decision_function
andpredict_proba
to check equality of models.sklearn.utils.estimator_checks
_partial_fit
and_fit
to use the appropriate parameters.assert_attributes_equal(est1, est2)
assert_attributes_not_equal(est1, est2)
assert_array_not_equal
Refs
Also see #406 - This PR fixes 1st under not so easy for pfit-able estimators, via 3a ( of this PR )
This change is