[RFC][MRG] ENH EstimatorCVMixin class #7305

raghavrv · 2016-08-31T14:24:35Z

Adds a new EstimatorCVMixin to be used for all Estimators with CV search of hyper params.

This mixin will provision enabling the predict methods of the best_estimator_ available to the inheriting class.

@jnothman @amueller @agramfort (Up for discussion after the release of 0.18)

jnothman · 2016-08-31T15:20:57Z

But most specialised CV implementations (LogisticRegressionCV, ElasticNetCV, ...) don't have estimator..?

At this point, I don't see us gaining much from such a refactoring.

jnothman · 2016-08-31T15:23:21Z

No, you can't share ducktyping delegation so easily, nor docstrings.

raghavrv · 2016-08-31T15:29:53Z

But most specialised CV implementations (LogisticRegressionCV, ElasticNetCV, ...) don't have estimator..?

I was thinking we could add one to unify the interface of EstimatorCV... (As we also would like to have cv_results_ in them)?

amueller · 2016-08-31T15:38:57Z

@raghavrv can you maybe focus on things that are release relevant for this week and the next? That would be very helpful. Thanks :)

raghavrv · 2016-08-31T15:41:55Z

Yes ofcourse! This PR was started just as I mentioned here and as I noted at the PR description... For discussion after release of 0.18 not for now... ;)

raghavrv · 2016-08-31T16:18:25Z

BTW @amueller just to clarify everything currently tagged with 0.18 is release critical correct? Or are there false positives in tagging?

amueller · 2016-08-31T17:34:34Z

there might be false positives ;)

amueller · 2016-08-31T17:34:40Z

and false negatives

raghavrv · 2016-10-16T12:28:30Z

@jnothman @amueller Could we revisit this now?

@amueller And responding to your earlier question in the other PR

would the new base-class also be used in other CV objects?

Currently only BaseSearchCV and in the near future GradientBoostingCV can use this... If this proposal goes well, BaseForestCV would also use it...

How much negative lines does that yield?

Right now since this just refactors the Mixin class out of GridSearchCV into base.py, this adds 30 lines, but with GradientBoostingCV reusing it, it could save about 100 lines of redundant code...

Additionally we can use this to design our validation API?

jmschrei · 2016-10-17T16:06:37Z

I'm not too familiar with this part of the code. What is the explicit benefit of this change?

amueller · 2016-10-17T17:34:13Z

@raghavrv I want to get out 0.18.1 first ;)
Also: why wouldn't the other existing CV estimators use it? What's different about them compared to the Random Forest and Gradient Boosting?

raghavrv · 2016-10-17T17:46:16Z

@raghavrv I want to get out 0.18.1 first ;)

Deja vu ;) But yes I'm working on it. :)

jnothman · 2016-10-18T10:19:19Z

Please, @raghavrv, can you add to the description precisely what you propose to be shared/provided by this mixin?

raghavrv · 2016-10-25T13:01:02Z

Also: why wouldn't the other existing CV estimators use it? What's different about them compared to the Random Forest and Gradient Boosting?

For instance LogisticRegressionCV does not use best_estimator_ nor does it need predict as it inherits from LogisticRegression. (Maybe we should do the same for GradientBoostingCV too instead of having this mixin?)

can you add to the description precisely what you propose to be shared/provided by this mixin?

When I opened this I thought it would be nice to refactor the predict.* functions (with if_delegate_has_method decorators) from GridSearchCV and BaseGradientBoostingCV into a separate mixin (EstimatorCVMixin).

Now after Andy brought up the question of a separate validation set instead of cross-validation.

I thought maybe we could use this mixin to specify a new method for passing the validation set explicitly...

def validated_fit(X_train, y_train, X_validate, y_validate):
    X_all = np.vstack(X_train, X_validate)
    y_all = np.vstack(y_train, y_validate)
    test_folds = np.hstack((np.zeros(X_train.shape[0]), np.ones(X_test.shape[0]))))
    self.cv = PredefinedSplit(test_folds)
    return self.fit(X_all, y_all)

(Not sure how sensible that train of thought was...)

jnothman · 2016-10-25T18:26:30Z

I am -1 for adding new methods. And I don't think there's enough shared
between *CV to benefit from sharing the delegated methods.

On 26 October 2016 at 00:01, Raghav RV notifications@github.com wrote:

Also: why wouldn't the other existing CV estimators use it? What's
different about them compared to the Random Forest and Gradient Boosting?

For instance LogisticRegressionCV does not use best_estimator_ nor does
it need predict as it inherits from LogisticRegression. (Maybe we should
do the same for GradientBoostingCV too instead of having this mixin?)

can you add to the description precisely what you propose to be
shared/provided by this mixin?

When I opened this I thought it would be nice to refactor the predict.*
functions (with if_delegate_has_method decorators) from GridSearchCV and
BaseGradientBoostingCV
#7071 into a separate
mixin (EstimatorCVMixin).

Now after Andy brought up the question of a separate validation set
instead of cross-validation.

I thought maybe we could use this mixin to specify an abstract method

def validated_fit(X_train, y_train, X_validate, y_validate):
X_all = np.vstack(X_train, X_validate)
y_all = np.vstack(y_train, y_validate)
test_folds = np.hstack((np.zeros(X_train.shape[0]), np.ones(X_test.shape[0]))))
self.cv = PredefinedSplit(test_folds)
return self.fit(X_all, y_all)

(Not sure how sensible that train of thought was...)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7305 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz68byP2WgQHKMjaGBxhA-qwdFgfLUks5q3f2QgaJpZM4Jxq9Z
.

raghavrv · 2016-10-25T18:29:46Z

Thanks for the comment :) I'll leave this closed and try to subclass from GradientBoosting for that PR...

ENH baseclass EstimatorCVMixin and move to base.py

df0faa1

raghavrv force-pushed the estimator_cv_mixin branch from 4fada2b to df0faa1 Compare August 31, 2016 14:25

raghavrv mentioned this pull request Oct 16, 2016

RandomForest{Classifier|Regressor}CV to efficiently find the best n_estimators #7243

Closed

amueller added this to the 0.19 milestone Oct 17, 2016

raghavrv closed this Oct 25, 2016

raghavrv deleted the estimator_cv_mixin branch October 25, 2016 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][MRG] ENH EstimatorCVMixin class #7305

[RFC][MRG] ENH EstimatorCVMixin class #7305

raghavrv commented Aug 31, 2016 •

edited

Loading

jnothman commented Aug 31, 2016

jnothman commented Aug 31, 2016

raghavrv commented Aug 31, 2016

amueller commented Aug 31, 2016

raghavrv commented Aug 31, 2016

raghavrv commented Aug 31, 2016

amueller commented Aug 31, 2016

amueller commented Aug 31, 2016

raghavrv commented Oct 16, 2016 •

edited

Loading

jmschrei commented Oct 17, 2016

amueller commented Oct 17, 2016

raghavrv commented Oct 17, 2016

jnothman commented Oct 18, 2016

raghavrv commented Oct 25, 2016 •

edited

Loading

jnothman commented Oct 25, 2016

raghavrv commented Oct 25, 2016

[RFC][MRG] ENH EstimatorCVMixin class #7305

[RFC][MRG] ENH EstimatorCVMixin class #7305

Conversation

raghavrv commented Aug 31, 2016 • edited Loading

jnothman commented Aug 31, 2016

jnothman commented Aug 31, 2016

raghavrv commented Aug 31, 2016

amueller commented Aug 31, 2016

raghavrv commented Aug 31, 2016

raghavrv commented Aug 31, 2016

amueller commented Aug 31, 2016

amueller commented Aug 31, 2016

raghavrv commented Oct 16, 2016 • edited Loading

jmschrei commented Oct 17, 2016

amueller commented Oct 17, 2016

raghavrv commented Oct 17, 2016

jnothman commented Oct 18, 2016

raghavrv commented Oct 25, 2016 • edited Loading

jnothman commented Oct 25, 2016

raghavrv commented Oct 25, 2016

raghavrv commented Aug 31, 2016 •

edited

Loading

raghavrv commented Oct 16, 2016 •

edited

Loading

raghavrv commented Oct 25, 2016 •

edited

Loading