Skip to content

[WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements #3907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

raghavrv
Copy link
Member

@raghavrv raghavrv commented Nov 29, 2014

Fixes #3896

New partial_fit tests:
The below tests are based off @arjoly's suggestion.

  • 1. General tests
    • Assert that partial_fit returns self. - Thanks @jnothman for the suggestion.
    • Clone test... assert clone(est).partial_fit == est.partial_fit
  • 2. Check that an error is raised if the number of features changes.
  • 3. Estimator reset test
    • Check that doing fit after a set of fit / partial_fit restarts the estimator.
    • Assert that partial_fit and then fit with a different number of features works without raising any Exceptions... - Thanks @amueller for the suggestion
  • 4. Check if partial_fit does not overwrite the previous model.
  • 5. Check that classifier handles correctly the classes argument in the partial_fit.
    • a. Check if mismatch between classes argument and np.unique(y_i) raises ValueError.
    • b. Check that error is raised if classes is not specified during first partial_fit call.
    • c. Check error is not raised if classes is not specified during subsequent calls.
  • Helper functions
    • _validate_y_against_classes - Check label mismatch between y and classes arg
    • assert_same_model(X, est1, est2)
    • assert_not_same_model(X, est1, est2)
      • Use predict, transform, decision_function and predict_proba to check equality of models.
    • In sklearn.utils.estimator_checks _partial_fit and _fit to use the appropriate parameters.
    • assert_attributes_equal(est1, est2)
    • assert_attributes_not_equal(est1, est2)
    • assert_array_not_equal

Refs
Also see #406 - This PR fixes 1st under not so easy for pfit-able estimators, via 3a ( of this PR )


This change is Reviewable

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 79a620b on ragv:partial_fit_testing_3896 into 4f1c381 on scikit-learn:master.

@jnothman
Copy link
Member

That seems a nice list (although testing that models are the same currently is done by testing the output of predict/transform).

I'm not altogether certain about the fit then partial_fit semantics: fit does not necessarily leave the model in a state in which incremental training can occur. I do think that we should decide what correct semantics are.

@amueller
Copy link
Member

amueller commented Dec 2, 2014

What is the current semantics?
Should fit always leave the model in a state where partial_fit is possible?

@raghavrv raghavrv force-pushed the partial_fit_testing_3896 branch 2 times, most recently from 7284706 to 8bfbd75 Compare December 4, 2014 23:36
@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2014

I am stuck with the 2nd alone... The case when fit then partial_fit == partial_fit then partial_fit does not seem to be consistently true.

( assuming that the equality checks for the accuracy ( score) )

Kindly review my current implementation of the other tests.

@raghavrv raghavrv force-pushed the partial_fit_testing_3896 branch from c3089fd to ee13683 Compare December 5, 2014 01:40
@jnothman
Copy link
Member

jnothman commented Dec 5, 2014

You should probably use predict/transform directly rather than score, as do the existing invariance tests.

I think fit, partial fit == partial_fit, partial_fit is the wrong semantics. I'd rather fit, partial fit => exception just to be clear.

@jnothman
Copy link
Member

jnothman commented Dec 5, 2014

You're assuming all partial_fit estimators are predictors. Some are or may be transformers. Perhaps we should use something like:

def _assert_same_model_method(method, X, estimator1, estimator2, msg):
    if hasattr(estimator1, method):
        assert hasattr(estimator2, method), '{!r} has transform, but {!r} does not'.format(estimator1, estimator2)
        assert_array_almost_equal(getattr(estimator1, method)(X), getattr(estimator2, method)(X), 2, 'When testing {}: {}'.format(method, msg))


def assert_same_model(X, estimator1, estimator2, msg):
    _assert_same_model_method('predict')
    _assert_same_model_method('transform')

@@ -386,6 +389,84 @@ def check_estimators_partial_fit_n_features(name, Alg):
assert_raises(ValueError, alg.partial_fit, X[:, :-1], y)


def check_estimators_pfit_return(name, Alg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alg -> Estimator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even I thought so... but this seems to be the convention followed by other checks...
Could I change all the others uniformly to Estimator too or I'll change this one alone?
Also what does Alg mean btw?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algorithm ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to change everything to Estimator, I might have picked the name, but don't have strong opinions about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... okay ! Will change it :)

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2014

@jnothman partial_fit after a fit does not raise an exception... Is it probably a bug?

For your 2nd comment, thanks a lot for the review... I'll make the changes :)

@MechCoder
Copy link
Member

Should fit always leave the model in a state where partial_fit is possible?

@amueller I'm not quite sure. I think it should either raise an error or it should overwrite and do a new fit from the beginning, if one calls partial_fit after fit. If partial_fit starts from where fit left, then it does not make a difference whether you fit or partial_fit, right? ;)

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2014

@MechCoder @amueller From what I observed fit always restarts the estimator. partial_fit continues from the previous setup irrespective of whether the previous setup was a fit or a partial_fit.

But as you and others suggested, perhaps partial_fit after fit should raise an exception?

@MechCoder
Copy link
Member

We could either.

  1. Raise an exception.
  2. Or overwrite the previously fit model if partial_fit is being called after fit or fit is being called after partial_fit.

I'm inclined towards the second one but I'm not sure what is the best thing to do.

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2014

I feel 2b is already implemented... I guess its between 2a or 1... Anyway thanks for your comments :)

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2014

A code revision related question... When revisions are suggested to the code, am I expected to add a new commit reflecting those revision or do I (soft) reset my previous work, make the changes and recommit?

@MechCoder
Copy link
Member

I think, you can do it either way. But if those revisions are minor, and if you feel it worthless to add a new commit for those (like pep8, cosmits and stuff), you can just amend your commit and push by force (at least, that's what I do).

@raghavrv
Copy link
Member Author

raghavrv commented Dec 5, 2014

Thanks... I'll also follow the same :)

@raghavrv raghavrv force-pushed the partial_fit_testing_3896 branch 2 times, most recently from 562bed4 to 133778a Compare December 6, 2014 11:08
@jnothman
Copy link
Member

jnothman commented Dec 6, 2014

The point is that the interaction between fit and partial_fit hasn't been previously addressed. The assumption is that the user will choose one or the other, but not both. Forcing a particular behaviour by way of the proposed tests will make the matter more consistent, but is also perhaps more of a framework than is actually needed, because the case is somewhat pathological.

@raghavrv
Copy link
Member Author

raghavrv commented Dec 6, 2014

@jnothman To your earlier comment,

Perhaps we should use something like {code}

Could you expand a little by what you meant?

I understood the part where you said all those estimators should not be tested assuming they are predictors and that we should separately test them based on whether they are a Transformer or Predictor ...

But I fail to understand the two code snippets...
If you meant that we could filter out Transformer like estimators which implement partial_fit... I just modified the method_filter to accept ! as a prefix to the method names...

[ @amueller perhaps such a functionality could be useful for type_filter too? ]

So for estimators that are not Transformers:
ests = all_estimators(method_filter=["predict", "!transform", "partial_fit"])

And for Transformers with partial_fit capability :
ests = all_estimators(method_filter=["transform", "partial_fit"], type_filter="transformer")
# The type filter is perhaps redundant here

Excuse me if I understood your comment incorrectly...

@raghavrv raghavrv force-pushed the partial_fit_testing_3896 branch from 133778a to bcb342d Compare December 7, 2014 12:34
@raghavrv raghavrv changed the title [WIP]Adding tests for estimators implementing partial_fit [WIP]Adding tests for estimators implementing partial_fit and tests for utils.testing.all_estimators function. Dec 7, 2014
@raghavrv raghavrv force-pushed the partial_fit_testing_3896 branch 3 times, most recently from 672b06a to 3baac44 Compare December 12, 2014 17:32
@raghavrv
Copy link
Member Author

@amueller I added some tests for all_estimators function as discussed here. Kindly take a look when you find time... I've also added ! prefixing for type_filter and method_filter....

( The pfit tests are WIP though... )

@raghavrv
Copy link
Member Author

Yes that will be really better! I'll cherry pick the helpers into a new branch and raise a PR!!

FIX Return self in partial_fit of BernoulliRBM
* Check if no error is raised when fit is called with a different number of
features
* Check if estimator is reset when fit is called after a partial_fit
* Check if error is raised when no of features changes between different
partial_fit calls
* Check if partial_fit does not overwrite the previous model
* Check if classes argument is parsed and validated correctly
* Check if clone(estimator).partial_fit == estimator.partial_fit
* Check if partial_fit returns self
@giorgiop
Copy link
Contributor

Not sure what's the state of this PR at the moment, but please consider #5416.
It would be nice to have a common test over all estimators with partial_fit to check whether two subsequent calls to fit (not partial) behave well. For example:

est.fit(X)
# Different input shape. It may raise errors in the case the internal state is not reset
est.fit(X[, :-1])
# It should give the same result as the first line
est.fit(X) 

In the case of scalers, the issue came from the fact that we are now implementing fit with one call to a partial_fit, without a reset first.

Ping @lesteve

@raghavrv
Copy link
Member Author

@giorgiop Work is going on at #4841, where new assert helpers to compare models is being written... Once that gets over fit reset tests will go in and later we'll have this PR in (hopefully) :)

@raghavrv raghavrv changed the title [MRG] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements [WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements Dec 11, 2015
@amueller amueller modified the milestones: 0.18, 0.19 Sep 22, 2016
@amueller amueller modified the milestone: 0.19 Jun 12, 2017
Base automatically changed from master to main January 22, 2021 10:48
@haiatn
Copy link
Contributor

haiatn commented Jul 29, 2023

Is this PR just waiting for review or is there still work to be done?

@adrinjalali
Copy link
Member

This is gonna need a fresh start if we're going to do it. And we'd need to do significant work on the semantics of partial_fit.

@adrinjalali adrinjalali closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invariance testing for partial_fit