Skip to content

Add stacking-meta-model #6674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

tsterbak
Copy link

@tsterbak tsterbak commented Apr 18, 2016

A wrapper allowing to combine models in a two stage stacking model.


This change is Reviewable

A wrapper allowing to combine models in a two stage stacking model.
@tsterbak
Copy link
Author

PR to #4816

@jnothman
Copy link
Member

Please add tests!


return y_out

def run_gridsearch(self,X,y,params):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this meant to do...? it doesn't look quite right for scikit-learn's API?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be perfect yet, but it's able to use for example the cross_val_score function with this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This evaluates the model on test data.

@jnothman
Copy link
Member

Please adhere to PEP8. Using flake8 might help

# libraries
import numpy as np

# scikit-learn base libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove these comments

@MechCoder
Copy link
Member

MechCoder commented Apr 28, 2016

From the definition of the StackingClassifier here (http://machine-learning.martinsewell.com/ensembles/stacking/), self.stage_two_clfs should beself.stage_two_clf and there should be no weights argument.

I can understand the use case, but you should be applying a VotingClassifer on top of the StackingClassifier and StackingClassifier should not be doing it internally. This should be the API for that use case.

X, y = make_classification()
estimators1 = [Classifier1(), Classifier2()]
estimator2 = LogisticRegression()
estimator3 = RandomForestClassifier()
sc1 = StackingClassifier(estimators1, estimator2)
sc2 = StackingClassifier(estimators1, estimator3)
vc = VotingClassifier([("sc1", sc1), ("sc2", sc2)], weights=[0.3, 0.7])
vc.fit(X, y)

Also we should rename self.stage_one_clfs to something that has estimators in it. Also we should have a list of tuples where each tuple is of two length, (string, estimator) as done in VotingClassifier for GridSearch support. See (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/voting_classifier.py#L227)


# fit the second stage models
for clf in self.stage_two_clfs:
clf.fit(self.__X,self.__y)
Copy link
Member

@MechCoder MechCoder Apr 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be just clf.fit(y_pred, y)? according to this http://machine-learning.martinsewell.com/ensembles/stacking/

@MechCoder
Copy link
Member

MechCoder commented Apr 28, 2016

@tsterbak Thanks for the effort!

Please do consider adding a bit of documentation, tests and PEP8 compliant.

@tsterbak
Copy link
Author

@MechCoder Thanks for your comments!! I will change the code asap.

@tsterbak
Copy link
Author

@MechCoder about the API:
I think you are right in some ways, but using the voting internally and compute first and second stage in one object is computionally more efficient. So you have to fit and predict the first stage models only once to predict with all second stage models.

More suggestions are welcome! :)

@MechCoder
Copy link
Member

All right, I can have a closer look after the code is cleaned up.

@yl565
Copy link
Contributor

yl565 commented Aug 7, 2016

@MechCoder Hi, is there anyone working on it currently? If not, I want to do a PR

@rth
Copy link
Member

rth commented Sep 2, 2016

Is anybody planning to work on this? Otherwise I would be happy to look into it.

@yl565
Copy link
Contributor

yl565 commented Sep 13, 2016

ping @jnothman

@MechCoder
Copy link
Member

Sure, please go ahead. There were some questions with regards to the API, but those can be discussed during the course of the pull request.

@ivallesp
Copy link
Contributor

ivallesp commented Sep 22, 2016

I would like to participate in this project. In fact, I almost win the BNP competition (22nd place) with my own implementation, which I would like to adapt to sklearn... If possible, I would like to work into it!!

@JivanRoquet
Copy link

@ivallesp I'm currently working on a stacking (aka Super Learner) implementation too - would you like to collaborate?

@yl565
Copy link
Contributor

yl565 commented Nov 20, 2016

@JivanRoquet I have already had a PR #7427 to stacking classifier, now just waiting for #7674 merged after which I can use the new _BaseComposition class. If you have any suggestions please let me know.

@JivanRoquet
Copy link

@yl565 awesome, thanks! I can see you provided a StackingClassifier.

My work is related to a StackingRegressor. Your PR is very helpful in that way, so that both codes can be consistent. I'll try to adapt what I've already done with the conventions you follow.

@ivallesp
Copy link
Contributor

@JivanRoquet Sure! I would love to colaborate on this.

@AlJohri
Copy link

AlJohri commented Oct 3, 2017

@jnothman should this PR be closed with #7427 as the more up to date?

EDIT: the even more up to date version is #8960

@amueller amueller added the Superseded PR has been replace by a newer PR label Aug 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants