[WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464

jnothman · 2017-07-30T05:42:18Z

This is yet another take on freezing, in which we provide a wrapper, unlike #8374 where we simply overwrite key methods. The advantage is that it is less magical and hence less inherently dangerous. But it also makes it much harder to make the frozen object to match all the properties of the contained estimator (e.g. _estimator_type, methods, instance checks).

TODO:

more tests of fit overloading
test clone changes
add an example of transfer learning using FreezeWrap, and mention in freeze and semi-supervised docs
test SelectFromModel.prefit deprecation

jnothman · 2017-07-30T05:47:13Z

Again, votes for or against the more magical #8374 are welcome

amueller · 2017-07-31T16:33:52Z

tests not passing ;)

I generally prefer meta-estimator solutions because they are easy for us and pretty safe and super flexible. @agramfort and @GaelVaroquaux tend not to like them because they feel they are hard to use / inspect.

The main use-case of freezing is inside a meta-estimator, so here we're adding another level of indirection.
One could argue that people are not interested to go that deep? If I already fit a model and then freeze it and put it in a CalibratedClassifierCV, do I really need to access it programmatically afterwards?

API question: should we freeze the estimator we get if prefit=True, or deprecate prefit=True and expect the user to freeze? I don't see anything wrong with supporting prefit=True and it would be easier for users.

amueller · 2017-07-31T16:35:13Z

I think this is my preferred solution.

jnothman · 2017-08-01T05:35:21Z

yes, i agree re model inspection: the indirection isn't a problem because the freezing happens separate from the modelling of the frozen component. There is no benefit that I can see in deprecating cv=predict in calibration, but we should certainly deprecate prefit in SelectFromModel. Being able to tune the selector by grid search is an immediate benefit here (although it could also be achieved through caching if we didn't assume the feature importance estimator was pre-built) On 1 Aug 2017 2:35 am, "Andreas Mueller" <notifications@github.com> wrote: I think this is my preferred solution. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9464 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60tx5Y_38HxB4avmhKndZ5dKphPGks5sTgJDgaJpZM4Oni_C> .

amueller · 2017-08-01T21:40:34Z

There is no benefit that I can see in deprecating cv=predict in
calibration, but we should certainly deprecate prefit in SelectFromModel.

Fair.

jnothman · 2017-08-01T23:30:54Z

sorry, should have been cv='prefit'

…

On 2 Aug 2017 7:40 am, "Andreas Mueller" ***@***.***> wrote: There is no benefit that I can see in deprecating cv=predict in calibration, but we should certainly deprecate prefit in SelectFromModel. Fair. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9464 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6wHSmNjMn_9NXpkcajLkkNmfnyYmks5sT5tUgaJpZM4Oni_C> .

jnothman · 2017-08-09T04:19:25Z

I'm now not altogether sure whether freezing is the right solution for transfer learning and semi-supervision... The problem is that you can't search over the parameters of a frozen estimator (to optimise some target supervised prediction accuracy, for instance), something which I assume is desirable. Rather you want some kind of FixedDataWrapper(estimator, X, y, memory='/tmp/blah') (i.e. fix the training data for the estimator, but allow parameters to vary; cache the models).

But that doesn't solve the problem of making a pre-trained model appear as part of something. Do we have good examples that motivate freezing???

amueller · 2017-08-12T14:19:23Z

You mean search parameters that would change training? That would be quite different, yes.
My motivating examples are still things that have prefit and putting them in pipelines / cloning them.

Is there a reason this implementation doesn't mirror all attributes?
If we provide a freeze function, we can set all the attributes etc on the instance match the base estimator, right?

Another idea that might be possible would be to create new frozen variants classes based on the estimator we get. That should allow us to cleanly overwrite the fit methods, but also mirror all other functionality, allow isinstance etc. Also, the class would have a name that's specific to the estimator but also clearly shows that it's frozen.
Though if we pre-define them we can't freeze 3rd party estimators, and if we try to create them on the fly, I'm not sure if we could get them to pickle.

jnothman · 2017-08-13T13:52:28Z

Oh, yes, this patch doesn't work even for providing coef_ and feature_importances_ to SelectFromModel. Which either means we need to copy across the dict, or use __getattr__, or just go back to the simpler #8374. Frozen variant classes is not a reasonable option as far as I'm concerned.

…

On 13 August 2017 at 00:19, Andreas Mueller ***@***.***> wrote: You mean search parameters that would change training? That would be quite different, yes. My motivating examples are still things that have prefit and putting them in pipelines / cloning them. Is there a reason this implementation doesn't mirror all attributes? If we provide a freeze function, we can set all the attributes etc on the instance match the base estimator, right? Another idea that might be possible would be to create new frozen variants classes based on the estimator we get. That should allow us to cleanly overwrite the fit methods, but also mirror all other functionality, allow isinstance etc. Also, the class would have a name that's specific to the estimator but also clearly shows that it's frozen. Though if we pre-define them we can't freeze 3rd party estimators, and if we try to create them on the fly, I'm not sure if we could get them to pickle. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9464 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64RnjKD8YltgqNUR3DLd3paUOboqks5sXbRtgaJpZM4Oni_C> .

amueller · 2017-08-13T15:31:18Z

I think if we go with this solution we should copy over the dict.
@GaelVaroquaux is offline for the next couple of weeks, and I think we should include him in the discussion.

Frozen variant classes is not a reasonable option as far as I'm concerned.

Can you elaborate?

jnothman · 2017-08-13T21:47:09Z

no hurry on my part

jnothman · 2019-02-26T16:07:49Z

Closing for now

jnothman added 8 commits February 17, 2017 01:46

ENH add freeze method which stops an estimator being cloned/refit

efa9a86

ENH/DOC copy param and caveats

4d06fb0

FIX case where non-estimator in clone

ecefd05

TST/FIX copy param

bcd4eae

FIX Avoid naming conflicts

3398026

TST/FIX fit_transform in freeze

705416b

FreezeWrap estimator wrapper

f89e1e8

Merge branch 'master' into freeze4

eb535b6

jnothman force-pushed the freeze4 branch from b2f0a36 to fc509ff Compare July 30, 2017 05:46

jnothman added Waiting for Reviewer New Feature labels Jul 30, 2017

jnothman force-pushed the freeze4 branch from fc509ff to 7ee0151 Compare August 2, 2017 00:18

Some insufficient tests and docs

db39cc5

jnothman force-pushed the freeze4 branch from 7ee0151 to db39cc5 Compare August 2, 2017 00:19

jnothman added 6 commits August 2, 2017 10:21

Cleaner merge

c949ae2

Fix a few errors

84e69c0

Avoid inappropriate local import

d0ec0c4

Add FreezeWrap to meta-estimators list for testing

0a9c7d3

Merge branch 'master' into freeze4

9a16857

API docs

a5e76d4

jnothman mentioned this pull request Apr 19, 2018

Add prefit to VotingClassifier #7382

Open

jnothman closed this Feb 26, 2019

jnothman mentioned this pull request Feb 26, 2019

API Freezing estimators #8370

Closed

thomasjpfan mentioned this pull request Oct 3, 2022

ENH Introduces the __sklearn_clone__ protocol #24568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464

[WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464

jnothman commented Jul 30, 2017 •

edited

Loading

jnothman commented Jul 30, 2017

amueller commented Jul 31, 2017

amueller commented Jul 31, 2017

jnothman commented Aug 1, 2017 via email

amueller commented Aug 1, 2017

jnothman commented Aug 1, 2017 via email

jnothman commented Aug 9, 2017

amueller commented Aug 12, 2017

jnothman commented Aug 13, 2017 via email

amueller commented Aug 13, 2017

jnothman commented Aug 13, 2017 via email

jnothman commented Feb 26, 2019

[WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464

[WIP] ENH estimator FreezeWrap to stop it being cloned/refit #9464

Conversation

jnothman commented Jul 30, 2017 • edited Loading

jnothman commented Jul 30, 2017

amueller commented Jul 31, 2017

amueller commented Jul 31, 2017

jnothman commented Aug 1, 2017 via email

amueller commented Aug 1, 2017

jnothman commented Aug 1, 2017 via email

jnothman commented Aug 9, 2017

amueller commented Aug 12, 2017

jnothman commented Aug 13, 2017 via email

amueller commented Aug 13, 2017

jnothman commented Aug 13, 2017 via email

jnothman commented Feb 26, 2019

jnothman commented Jul 30, 2017 •

edited

Loading