Implications of FrozenEstimator on our API #29893

thomasjpfan · 2024-09-19T14:25:39Z

With #29705, we have a simple way to freeze estimators, which means there is no need for cv="prefit". This also opens the door for #8350 to make Pipeline and FeatureUnion follow our conventions. This issue is to discuss the API implications of introducing FrozenEstimator. Here are the two I had in mind:

cv="prefit"

For the cv case, users pass a frozen estimator directly into cv:

rf = RandomForestClassifer()
rf.fit(X_train, y_train)
frozen_rf = FrozenEstimator(rf)

calibration = CalibratedClassifierCV(frozen_rf)
calibration.fit(X_calib, y_calib)

Making this change will simplify our codebase with cv="prefit"

compose.Pipeline

We introduce a new compose.Pipeline which follows our conventions with clone. (The current pipeline.Pipeline does not clone.)

from sklearn.compose import Pipeline

prep = ColumnTransformer(...)
prep.fit(X_train, y_train)
frozen_prep = FrozenEstimator(prep)

pipe = Pipeline([frozen_prep, LogisticRegression()])

pipe.fit(X_another, y_another)

In both cases, I like prefer the semantics of FrozenEstimator.

The text was updated successfully, but these errors were encountered:

adrinjalali · 2024-09-19T14:36:54Z

Was just talking with @glemaitre about this and we also very much agree with these points (or at least I thought Guillaume agrees 😅 ). I'd like to have the FrozenEstimator PR merged, and then open follow up PRs to deal with these implications.

This also deserves a @scikit-learn/core-devs ping.

glemaitre · 2024-09-19T16:45:23Z

In addition of cv="prefit", we also have the option prefit=True for SelectFromModel.

So in general, I think that using the FrozenEstimator is the right way because:

it is explicit which estimator is frozen
it is less brittle when having nested objects (pipeline in column transformer inside a cross-validation). We never really got it right in the past because this is rather complex to not miss a cloning somewhere.
it allows to remove a parameter option that is probably of interest of "advanced" user reducing a bit of complexity
thinking of cv="prefit", the semantic of the parameter & value is not straightforward without reading the documentation (a CV is not really prefitted)
it simplifies the code internally because clone is handled by the FrozenEstimator and not anymore in each of the meta-estimator.

To me there is one downside:

we request to nest an estimator inside a new meta-estimator. However, taken that this is an advanced use-case, I think that this is a mitigated complexity considering the type of user that are going to use the pattern.

So given the pros/cons that I could come with, I'm pretty much in favor of introducing FrozeEstimator and deprecate cv="prefit" and prefit=True/False

adam2392 · 2024-09-27T12:51:46Z

To me there is one downside:

we request to nest an estimator inside a new meta-estimator. However, taken that this is an advanced use-case, I think that this is a mitigated complexity considering the type of user that are going to use the pattern.

So given the pros/cons that I could come with, I'm pretty much in favor of introducing FrozeEstimator and deprecate cv="prefit" and prefit=True/False

Would it be weird to add a freeze() / get_frozen() method to BaseEstimator, so that all estimators can effectively return a FrozenEstimator without having to import FrozenEstimator and do the nesting?

Perhaps it adds too much complexity and the user is confused because then there are multiple ways of "getting a frozen estimator"?

adrinjalali · 2024-09-27T14:53:44Z

I wouldn't mind a def get_frozen(self): return FrozenEstimator(self) indeed. But I think we can move forward with the rest, and decide if we want to add this nicety later. Good suggestion though.

adrinjalali · 2024-10-30T09:46:47Z

@ogrisel you might like the get_frozen() idea here if you don't want users to have to do an extra import.

ogrisel · 2024-10-30T10:08:30Z

I wouldn't mind a def get_frozen(self): return FrozenEstimator(self) indeed.

No strong opinion on whether we need it or not, but if we introduce it, I would vote in favor of a top level sklearn.freeze to be consistent with our existing sklearn.clone.

ogrisel · 2024-10-30T10:09:42Z

@ogrisel you might like the get_frozen() idea here if you don't want users to have to do an extra import.

You still need to import it (unless you already imported the top level sklearn for other things).

EDIT: this comment is wrong.

ogrisel · 2024-10-30T10:11:08Z

Let's finalize the reviews of #30171 and #30172 first, and then let's open a PR to explore how sklearn.freeze (or sklearn.frozen.get_frozen) would feel in the examples.

ogrisel · 2024-10-30T10:13:26Z

Oh, I misread the proposals. I did not realize the proposal above was about introducing a method instead of a function... CalibratedClassifierCV(mode.freeze()).fit(X_cal, y_cal) would feel right for me. But it's expanding the common public API for scikit-learn estimators, possibly also for third-party estimator developers.

adrinjalali · 2024-10-30T10:46:08Z

For third party estimators who inherit from BaseEstimator (which should be all of them at this point with our new requirements), then they don't really need to do anything.

glemaitre · 2024-10-30T11:00:34Z

I feel that a function rather than a method is more natural with the current state of our code base. It would stand nicely next to clone.

I would find strange that cloning and freeze are not aligned in this regard (i.e. function vs method).

adrinjalali · 2024-10-30T11:57:51Z

I would be more in favor of adding a clone to BaseEstimator rather than putting freeze beside clone 😁 .

I'd be somewhat against a sklearn.base.freeze as opposed to BaseEstimator.freeze, since the former requires the user to do an extra import anyway. And at the point we require an extra import, it could be FrozenEstimator itself.

betatim · 2024-10-30T13:50:12Z

My intuition is also that a standalone function would be nicer than adding a method.

adrinjalali · 2024-10-30T14:00:38Z

If we do a function, in effect, we're changing

(1)

from sklearn.frozen import FrozenEstimator
MetaEstimator(estimator=FrozenEstimator(subestimator))

or

import sklearn
MetaEstimator(estimator=sklearn.frozen.FrozenEstimator(subestimator))

to

(2)

from sklearn import freeze
MetaEstimator(estimator=freeze(subestimator))

or

import sklearn
MetaEstimator(estimator=sklearn.freeze(subestimator))

while having a method, would mean

(3)

MetaEstimator(estimator=subestimator.freeze())

To me, (1) and (2) are not different enough to warrant the function, while (3) is a nicer enough experience that it's worth doing.

glemaitre · 2024-10-30T14:51:35Z

I would be more in favor of adding a clone to BaseEstimator rather than putting freeze beside clone 😁.

Since this alternative avoids an import and looking at #30171 when it comes to usability, then I would agree that methods might be nicer to use.

github-actions bot added the Needs Triage Issue requires triage label Sep 19, 2024

thomasjpfan mentioned this issue Sep 19, 2024

FEA add FrozenEstimator #29705

Merged

thomasjpfan added API RFC and removed Needs Triage Issue requires triage labels Sep 19, 2024

This was referenced Oct 29, 2024

API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171

Merged

REVERT ENH add the parameter prefit in the FixedThresholdClassifier (#29067) #30172

Merged

adrinjalali mentioned this issue Nov 4, 2024

ENH implement BaseEstimator.freeze #30211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implications of FrozenEstimator on our API #29893

Implications of FrozenEstimator on our API #29893

thomasjpfan commented Sep 19, 2024 •

edited

Loading

adrinjalali commented Sep 19, 2024

glemaitre commented Sep 19, 2024

adam2392 commented Sep 27, 2024

adrinjalali commented Sep 27, 2024

adrinjalali commented Oct 30, 2024

ogrisel commented Oct 30, 2024

ogrisel commented Oct 30, 2024 •

edited

Loading

ogrisel commented Oct 30, 2024

ogrisel commented Oct 30, 2024 •

edited

Loading

adrinjalali commented Oct 30, 2024

glemaitre commented Oct 30, 2024

adrinjalali commented Oct 30, 2024

betatim commented Oct 30, 2024

adrinjalali commented Oct 30, 2024 •

edited by ogrisel

Loading

glemaitre commented Oct 30, 2024

Implications of FrozenEstimator on our API #29893

Implications of FrozenEstimator on our API #29893

Comments

thomasjpfan commented Sep 19, 2024 • edited Loading

cv="prefit"

compose.Pipeline

adrinjalali commented Sep 19, 2024

glemaitre commented Sep 19, 2024

adam2392 commented Sep 27, 2024

adrinjalali commented Sep 27, 2024

adrinjalali commented Oct 30, 2024

ogrisel commented Oct 30, 2024

ogrisel commented Oct 30, 2024 • edited Loading

ogrisel commented Oct 30, 2024

ogrisel commented Oct 30, 2024 • edited Loading

adrinjalali commented Oct 30, 2024

glemaitre commented Oct 30, 2024

adrinjalali commented Oct 30, 2024

betatim commented Oct 30, 2024

adrinjalali commented Oct 30, 2024 • edited by ogrisel Loading

glemaitre commented Oct 30, 2024

thomasjpfan commented Sep 19, 2024 •

edited

Loading

ogrisel commented Oct 30, 2024 •

edited

Loading

ogrisel commented Oct 30, 2024 •

edited

Loading

adrinjalali commented Oct 30, 2024 •

edited by ogrisel

Loading