Skip to content

[WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jul 30, 2017

This is yet another take on freezing, in which we provide a wrapper, unlike #8374 where we simply overwrite key methods. The advantage is that it is less magical and hence less inherently dangerous. But it also makes it much harder to make the frozen object to match all the properties of the contained estimator (e.g. _estimator_type, methods, instance checks).

TODO:

  • more tests of fit overloading
  • test clone changes
  • add an example of transfer learning using FreezeWrap, and mention in freeze and semi-supervised docs
  • test SelectFromModel.prefit deprecation

@jnothman
Copy link
Member Author

Again, votes for or against the more magical #8374 are welcome

@amueller
Copy link
Member

tests not passing ;)

I generally prefer meta-estimator solutions because they are easy for us and pretty safe and super flexible. @agramfort and @GaelVaroquaux tend not to like them because they feel they are hard to use / inspect.

The main use-case of freezing is inside a meta-estimator, so here we're adding another level of indirection.
One could argue that people are not interested to go that deep? If I already fit a model and then freeze it and put it in a CalibratedClassifierCV, do I really need to access it programmatically afterwards?

API question: should we freeze the estimator we get if prefit=True, or deprecate prefit=True and expect the user to freeze? I don't see anything wrong with supporting prefit=True and it would be easier for users.

@amueller
Copy link
Member

I think this is my preferred solution.

@jnothman
Copy link
Member Author

jnothman commented Aug 1, 2017 via email

@amueller
Copy link
Member

amueller commented Aug 1, 2017

There is no benefit that I can see in deprecating cv=predict in
calibration, but we should certainly deprecate prefit in SelectFromModel.

Fair.

@jnothman
Copy link
Member Author

jnothman commented Aug 1, 2017 via email

@jnothman
Copy link
Member Author

jnothman commented Aug 9, 2017

I'm now not altogether sure whether freezing is the right solution for transfer learning and semi-supervision... The problem is that you can't search over the parameters of a frozen estimator (to optimise some target supervised prediction accuracy, for instance), something which I assume is desirable. Rather you want some kind of FixedDataWrapper(estimator, X, y, memory='/tmp/blah') (i.e. fix the training data for the estimator, but allow parameters to vary; cache the models).

But that doesn't solve the problem of making a pre-trained model appear as part of something. Do we have good examples that motivate freezing???

@amueller
Copy link
Member

You mean search parameters that would change training? That would be quite different, yes.
My motivating examples are still things that have prefit and putting them in pipelines / cloning them.

Is there a reason this implementation doesn't mirror all attributes?
If we provide a freeze function, we can set all the attributes etc on the instance match the base estimator, right?

Another idea that might be possible would be to create new frozen variants classes based on the estimator we get. That should allow us to cleanly overwrite the fit methods, but also mirror all other functionality, allow isinstance etc. Also, the class would have a name that's specific to the estimator but also clearly shows that it's frozen.
Though if we pre-define them we can't freeze 3rd party estimators, and if we try to create them on the fly, I'm not sure if we could get them to pickle.

@jnothman
Copy link
Member Author

jnothman commented Aug 13, 2017 via email

@amueller
Copy link
Member

I think if we go with this solution we should copy over the dict.
@GaelVaroquaux is offline for the next couple of weeks, and I think we should include him in the discussion.

Frozen variant classes is not a reasonable option as far as I'm concerned.

Can you elaborate?

@jnothman
Copy link
Member Author

jnothman commented Aug 13, 2017 via email

@jnothman
Copy link
Member Author

Closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants