-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Add new feature StackingClassifier #7427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…redict_log_proba'
No cross-validation will be performed when cv=1
When cv=1 (no cross-validation), fit one sub-estimator per job. Otherwise use n_jobs option in `cross_val_predict`.
Stacking classifier implemented. All suggestions are welcomed. I'm also considering adding the ability to set estimators in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice. Can you add an illustrative example?
|
||
|
||
class StackingClassifier(BaseEstimator, ClassifierMixin): | ||
""" Stacking classifier for combining unfitted estimators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you mean by unfitted but I feel it is a bit awkward here and in the next sentence.
|
||
For integer/None inputs, if the estimator is a classifier and ``y`` is | ||
either binary or multiclass, :class:`StratifiedKFold` is used. In all | ||
other cases, :class:`KFold` is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all other cases? If y is a different format, I'd say
------- | ||
self : object | ||
""" | ||
if isinstance(y, np.ndarray) and len(y.shape) > 1 and y.shape[1] > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use _type_of_target
? not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain to me what do you mean by _type_of_target
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn.utils.multiclass.type_of_target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
raise NotImplementedError('Multilabel and multi-output' | ||
' classification is not supported.') | ||
|
||
if self.estimators is None or len(self.estimators) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if not self.estimators
? Also, please add a got {}
to the error.
|
||
if not is_classifier(self.meta_estimator): | ||
raise AttributeError('Invalid `meta_estimator` attribute, ' | ||
'`meta_estimator` should be a classifier') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also print the class name?
The meta-estimator to combine the predictions of each individual | ||
estimator | ||
|
||
method : string, optional, default='predict_proba' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "default" should be "auto" which is predict_proba if it exists and decision_function otherwise.
raise ValueError('Underlying estimator `{0}` does not ' | ||
'support `{1}`.'.format(name, param)) | ||
|
||
self.le_ = LabelEncoder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.le_ = LabelEncoder().fit(y)
?
self.meta_estimator_.fit(scores, transformed_y, **kwargs) | ||
return self | ||
|
||
def _form_meta_inputs(self, predicted): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this doesn't need to be a separate method? maybe just a loop in _est_predict
or something?
@amueller I have updated the codes to reflect your suggestions. Though I figured |
@amueller, do you mean show an example of its usage including the graphs in this issue's conversation? |
I mean an example to show how to use it and what it does. I haven't really put any thought into it ;) |
Sure you can update it with _BaseComposition after #7674 is merged On 17 November 2016 at 22:22, Yichuan Liu notifications@github.com wrote:
|
Hi, I have been reviewing the work done by @yl565 and it is impressive. I really like how it is organized and I think it is a really well done work. However, I would like to add one functionality which may be key of the implementation: some problems, specially Kaggle ones, require to train thousands of classifiers; that's why I think that the current implementation is a bit monolithic. It would be nice being able to generate the train and test meta-predictors separately in order to be able to store them to disk and retrieve them in the future to either, create a next layer, or combine them using one meta-model. So, what I mean is that, for example, if the parameter meta_estimator of the StackingClassifier class is null, the fit method would return an object with an attribute containing a matrix with all the training metapredictors, and the predict method would return a matrix with all the predictions of the models trained using the whole training set. In both cases it would be done in the same column order, assuring a correct link between datasets. What do you think about it? Does it make sense to you? If so, I can help developing that funcionality. Best, |
@ivallesp, I'm not sure I completely understand what you have in mind but it seems to me the |
Hi Iván, Are you mostly talking about collapsing, for the sake of prediction, a set On 21 November 2016 at 03:44, Yichuan Liu notifications@github.com wrote:
|
Sorry, it seems I was not very clear. Stacked generalization is based in generating, using a list of models, training metapredictors by generating out-of-fold predictions using k-fold and then stacking them into vectors (meta-features; one per model) with the same length as the training set. That is what cross_val_predict method does. The second step consists on training the models using the whole training set to generate test metapredictors. On top of this, a meta model is trained in order to intelligently combine the meta-predictors in a more powerful prediction which theoretically, at least will be as bad as the best of your predictions. What I mean is that it would be really interesting to be able to, once trained several models and generated the new training set and test set composed of meta-predictors, retrieve these matrices (or datasets) in order to be able to treat them in a different way. I mean, being able to stop just before applying the meta-model. That way, the user would be able to store these meta-predictors for the training and the test set in order to, for example, build a new stacked generalization (a new layer) on top of this. Another example would be appending the meta-predictors to the original training and test set and build a model that combines the meta-predictors and the original features. In addition I would remark that sometimes it may be useful for, for instance, predict a transformed target variable; for example, in the case of a skewed target variable in a regression problem, you can build a stacker using 20 models with the original target variable, 20 more with the log of the target variable, and 20 more using the Box-Cox of it. That way the user would be able to combine these 60 meta-predictors and build a meta-model that combines them. For that, we would need to get the training-set associated metapredictors and the test-set associated metapredictors. Am I now being clearer? If not, or not completely, please, do not hesitate to let me know and I would try to add more examples. |
Right. I'm not able to give this enough attention to familiarise myself On 21 November 2016 at 11:05, Iván Vallés notifications@github.com wrote:
|
I will add the |
thank you! |
I'm thinking about something like |
I think it's fine to assume applying meta is false. Just describe the On 21 November 2016 at 23:39, Yichuan Liu notifications@github.com wrote:
|
Add to your todo list: narrative documentation (in doc/) and an example in examples/ comparing voting classifier with a couple of stacking meta-classifiers (although the real boon here is that it can be used for regression too) |
We probably want a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've not really tested the cv != 1
case.
There should also be a test that the estimators are cloned (i.e. the original inputs are unaffected).
Otherwise, this is looking pretty good!
""" | ||
if any(s in type_of_target(y) for s in ['multilabel', 'multioutput']): | ||
raise NotImplementedError('Multilabel and multi-output' | ||
' classification is not supported.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
self.le_ = LabelEncoder().fit(y) | ||
self.classes_ = self.le_.classes_ | ||
|
||
transformed_y = self.le_.transform(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain why we need to do this.
self.classes_ = self.le_.classes_ | ||
|
||
transformed_y = self.le_.transform(y) | ||
if self.cv == 1: # Do not cross-validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work if cv is an array, though that is unlikely.
delayed(_parallel_fit)(clone(clf), | ||
X, transformed_y, kwargs) | ||
for _, clf in self.estimators) | ||
scores = self._est_predict(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel scores
is the best name. Perhaps y_pred
or y_score
or predictions
or Xt
self.meta_estimator_.fit(scores, transformed_y, **kwargs) | ||
return self | ||
|
||
def _form_meta_inputs(self, clf, predicted): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this clean_scores
or clean_predictions
or whatever?
|
||
|
||
def test_sample_weight(): | ||
"""Tests sample_weight parameter of StackingClassifier""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with nosetests, docstrings in test functions make test transcripts harder to read. make this a comment instead.
|
||
|
||
def test_classify_iris(): | ||
"""Check classification by majority label on dataset iris.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with nosetests, docstrings in test functions make test transcripts harder to read. make this a comment instead.
|
||
|
||
def test_predict_on_toy_problem(): | ||
"""Manually check predicted class labels for toy dataset.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with nosetests, docstrings in test functions make test transcripts harder to read. make this a comment instead.
|
||
y = np.array([1, 1, 1, 2, 2, 2]) | ||
|
||
assert_equal(all(clf1.fit(X, y).predict(X)), all([1, 1, 1, 2, 2, 2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean assert_array_equal
? all
will return a boolean, so you're asserting the equality of booleans here.
eclf1 = StackingClassifier( | ||
estimators=[('lr', clf1), ('rf', clf2), ('nb', clf3)], | ||
meta_estimator=clfm | ||
).fit(X, y, sample_weight=np.ones((len(y),))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the cv=1
case, at least, it should be possible to test sample_weight
as corresponding to a repetition of elements.
maybe I'm late here, but here's my two cents: |
thanks for the input, Caio. I agree that we're adding a lot of complexity
here by providing a classifier rather than a transformer. I am not decided
about whether we should require the user to construct a FeatureUnion. I
think for convenience we should provide that functionality.
But we should use cross_val_predict.
…On 5 Mar 2017 2:51 am, "Caio Oliveira" ***@***.***> wrote:
maybe I'm late here, but here's my two cents:
stacking is already "hard": what it *must* do is help avoiding data
leakage during training. this implementation looks like it's doing a lot of
stuff (I just skimmed through the code, but I saw it's even label encoding
stuff - I'm not sure it's a good idea to do this much stuff here).
an idea: the first layer in the stacking can be seen as a transformer:
it'll receive a feature set and outputs a new feature set. each classifier
being stacked is independent from the other from the same layer, so it
looks like a perfect situation to use pipeline's API. the real trick to
turn a pipeline into a stacking classifier is the blending, and that's
what's missing from sklearn. there's no need to implement a class to do
what pipeline API does, just a class to blend a classifier and make it
suitable for use as a transformer.
I've implemented those ideas here: https://gist.github.com/caioaao/
28bf77e9a95ae6b70b14141feacb1f84
It doesn't have tests and it's probably lacking some asserts to make it
more robust, so it's not useful for a PR in sklearn, but it may be useful
for comparison purposes
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6y-IBpr3nKsScN5stQzKuh8LsXDtks5riYiFgaJpZM4J9Uqg>
.
|
@jnothman didn't knew about cross_val_predict. that actually makes things simpler. updated the gist to use it. there are, of course, some improvements that can be made (like being able to pass other parameters to about requiring the user to construct a FeatureUnion: I really like a functional approach better and I think the code is clearer when you compose stuff using functions instead of creating new classes, but it doesn't stop you from, instead of using the |
I don't see the relevance of LassoCV at all. But yes, convenience is
preferred to functional purity.
…On 6 Mar 2017 2:58 am, "Caio Oliveira" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> didn't knew about
cross_val_predict. that actually makes things simpler. updated the gist to
use it. there are, of course, some improvements that can be made (like
being able to pass other parameters to cross_val_precit)
https://gist.github.com/caioaao/28bf77e9a95ae6b70b14141feacb1f84
about requiring the user to construct a FeatureUnion: I really like a
functional approach better and I think the code is clearer when you compose
stuff using functions instead of creating new classes, but it doesn't stop
you from, instead of using the make_stack_layer, write a class that just
uses FeatureUnion under the hood, instead of doing an ad-hoc implementation
that in the end provides basically the same functionality. I'm actually
against this choice and think make_stacking_classifier(stacked_estimators,
meta_classifier) would be cleaner, but that's a design choice that's not
really aligned with the rest of sklearn's api - an example is LassoCV,
RidgeCV, etc
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61b2sYu1sAfkIUNZYWBbfbcIhImhks5rituIgaJpZM4J9Uqg>
.
|
LassoCV was just an example of a class that just wraps a composition of two classes. |
well you can build on _BaseComposition in any case, Judy merging or copying
it in here. I'm inclined towards making this a transformer.
…On 9 Mar 2017 5:06 am, "Yichuan Liu" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Since #7674
<#7674> does not seems
to be able to merge anytime soon, do you think its better to remove "update
with _BaseComposition after #7674
<#7674> is merged" from
the to-do list so we can proceed with this PR
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz635pnfXZDFFbDm6ORq8RBS25FSCjks5rju4ZgaJpZM4J9Uqg>
.
|
What's the status on this PR? I'll have some free time next week, and I was thinking of reviewing this PR. |
as this looks stale, I'd really like to have a shot at implementing it as I said before. if I can do it before the weekend, would you guys mind taking it into consideration before merging/choosing this one? |
Have been busy with thesis. I'll try working on it this week. |
well if you'd like others to contribute some of those features to your
branch, just say
…On 31 May 2017 9:21 am, "Yichuan Liu" ***@***.***> wrote:
Have been busy with thesis. I'll try working on it this week.
I still need to incorporate #7674
<#7674> into this.
There is also the issue of whether or not to support Multilabel and
multi-output classification which would make this PR more complicated.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65YO08Q1zi3YJh9JVDs4cN18XND7ks5r_KSEgaJpZM4J9Uqg>
.
|
I implemented my comments in #8960 . As I said there, it's ready to handle several types of estimators (not just classifiers) and the implementation is simpler. |
okay thanks.
…On 1 Jun 2017 9:51 pm, "Caio Oliveira" ***@***.***> wrote:
I implemented my comments in #8960
<#8960> . As I said
there, it's ready to handle several types of estimators (not just
classifiers) and the implementation is simpler.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69TKDDrbBNbe9a0NmpBid8kTnNfhks5r_qW_gaJpZM4J9Uqg>
.
|
hi @jnothman, I'm interesting in helping out with this PR. I've manually implemented a stacked classifier for my personal project several times at this point but haven't come up with a solution that lets me do GridSearchCV with the stacked classifier. would anyone mind summarizing what's left to get this PR merged? is the to-do list at the top of the PR still accurate? EDIT my mistake, I didn't see #8960 was more up to date |
PR to #4816. This is a continuation of #6674.
To-do:
transform
methodThis change is