Skip to content

Conversation

rasbt
Copy link
Contributor

@rasbt rasbt commented Jan 26, 2015

This is an implementation of an estimator to combine scikit-learn classification estimators into an ensemble classifier for majority rule class label prediction or classification based on weighted probability averages.

I added a section at the bottom of the ensemble.rst documentation with a more detailed description and examples.

I'd be happy to make improvements or adjustments if necessary. Please let me know, I am happy to help!

@jnothman
Copy link
Member

I think EnsembleClassifier is probably a bit too general. Perhaps VotedClassifier or VotingClassifier or MajorityRuleClassifier.

@rasbt
Copy link
Contributor Author

rasbt commented Jan 26, 2015

I agree, Joel. I like MajorityRuleClassifier, it pretty much summarizes the concept in one phrase.

@jnothman
Copy link
Member

except that often you will be relying on the highest vote, not a strict
majority

On 26 January 2015 at 16:57, Sebastian Raschka notifications@github.com
wrote:

I agree, Joel. I like MajorityRuleClassifier, it's pretty much
summarizes the concept in one phrase.


Reply to this email directly or view it on GitHub
#4161 (comment)
.

@jnothman
Copy link
Member

Not that I've read the exact algorithm.

On 26 January 2015 at 19:39, Joel Nothman joel.nothman@sydney.edu.au
wrote:

except that often you will be relying on the highest vote, not a strict
majority

On 26 January 2015 at 16:57, Sebastian Raschka notifications@github.com
wrote:

I agree, Joel. I like MajorityRuleClassifier, it's pretty much
summarizes the concept in one phrase.


Reply to this email directly or view it on GitHub
#4161 (comment)
.

"""
if self.weights:
avg = self.predict_proba(X)
maj = np.apply_along_axis(lambda x: max(enumerate(x), key=operator.itemgetter(1))[0], axis=1, arr=avg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you refactor this to two lines maybe? Or perhaps replace it with the below...

maj = np.apply_along_axis(lambda x: max(enumerate(x), 
                          key=operator.itemgetter(1))[0],
                          axis=1, arr=avg)

( PEP8 specifications say that max line length should be 79... )

@glouppe
Copy link
Contributor

glouppe commented Jan 26, 2015

The algorithm you have implemented is actually called "soft voting" rather than "majority voting". In the former, the prediction is the argmax of the sums of predict probas, in the latter the prediction is the argmax of the sums of predictions.

By the way, a more general algorithm is "stacking", which basically learns a new estimator from the predictions of others. Shouldn't we try to go in that direction instead, with majority voting/soft voting as special cases?

@arjoly
Copy link
Member

arjoly commented Jan 26, 2015

By the way, a more general algorithm is "stacking", which basically learns a new estimator from the predictions of others. Shouldn't we try to go in that direction instead, with majority voting/soft voting as special cases?

+1

@amueller
Copy link
Member

It seemed to me that stacking was less used in practice, but Owen (from data bricks, Kaggle #1) said he uses it quite a bit.
I think it would be nice to have stacking but the simple case is also helpful, right? So why not implement this first, and possibly add stacking later? The implementation is pretty simple anyhow...

@rasbt
Copy link
Contributor Author

rasbt commented Jan 26, 2015

The algorithm you have implemented is actually called "soft voting" rather than "majority voting". In the former, the prediction is the argmax of the sums of predict probas, in the latter the prediction is the argmax of the sums of predictions.

Hm, sounds like that implemented both at once then. If weights are provided, it is a "soft voting" (based on probas) and without the weights "majority voting" (based on class labels)

Do you think it would make sense to split the current implementation into a SoftVotingClassifier and a MajorityRuleClassifier for clarity?

About "stacking": I agree with @aemuller that this could be implemented additionally later. E.g., by using the attributes from the MajorityRuleClassifier as input for a new classifier. On the other hand, would this need an implementation? Maybe an example would be sufficient that shows how the returned predictions/probas from the MajorityRule/SoftVoting classifier can be used as input for a new classifier...

@amueller
Copy link
Member

@rasbt in stacking (see ESL) the probabilities of the first stage of classifiers are obtained using out of sample estimates using cross-validation, so it is a bit more complex.

@amueller
Copy link
Member

I wouldn't split this up as two classes, but I would add a different string options for using majority or soft voting. Maybe call it VotingClassifier and add a voting parameter that can be hard, soft or a sequence of weights? Not sure if the name voting is very discoverable, weights seems more natural.

@amueller
Copy link
Member

With you current interface it would also be possible to just add a uniform keyword, which might be simpler than specifying a constant vector.

@rasbt
Copy link
Contributor Author

rasbt commented Jan 26, 2015

@amueller Good idea! I added the voting parameter which can be set to 'soft' or 'hard'.

About the uniform keyword: I don't think it is necessary. Right now, uniform weights are used by default if weights=None, which is maybe cleaner because np.average also has a weights parameter.
avg = np.average(self.probas_, axis=0, weights=self.weights)
Or would you think that an explicit 'uniform' keyword could be beneficial for clarity?

self.voting = voting

if weights and len(weights) != len(clfs):
raise ValueError('Number of classifiers and weights must be equal')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also give the numbers here.

@amueller
Copy link
Member

I don't see where the weights are used in the prediction currently. If you let the user specify both the weights and the voting, then you need to make sure that the user doesn't shoot herself in the foot easily. So if voting='hard' and some weights are specified, they should also be used, I guess for weighted voting.

I don't think you need to add an additional keyword uniform. It is just that a user should be able to use both, hard and soft voting, without having to specify the number of classes in the estimator construction. I think that is possible with your current solution.

Something that might be tricky is that predict_proba should be consistent with predict in all cases. For hard voting that causes some minor issues, see #3891. You could just return the normalized votes but I am not a fan of that, as you throw away so much information (and your roc score will be abysmal).

Parameters
----------
clfs : array-like, shape = [n_classifiers]
A list of scikit-learn classifier objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about redundancy.

@amueller
Copy link
Member

You should add the class in sklearn.utils.testing to DONT_TEST I think.

@rasbt
Copy link
Contributor Author

rasbt commented Jan 26, 2015

@amueller Thanks for the comments, I think/hope I've addressed all of your points except for the "predict_proba should be consistent" part.

I don't see where the weights are used in the prediction currently. If you let the user specify both the weights and the voting, then you need to make sure that the user doesn't shoot herself in the foot easily. So if voting='hard' and some weights are specified, they should also be used, I guess for weighted voting.

The weights are being used now for the "hard" voting in the most recent update. I didn't implement it earlier because I really wouldn't want to use something like this but prefer the weighted avg. probabilities instead. But you are right about the "shooting in the foot" ;)

Something that might be tricky is that predict_proba should be consistent with predict in all cases. For hard voting that causes some minor issues, see #3891. You could just return the normalized votes but I am not a fan of that, as you throw away so much information (and your roc score will be abysmal).

Here, the challenge was that not all classifiers have the predict_proba (I think SVMs?); doing it like this way just allows the usage of "hard" voting if such classifiers are being used in the "ensemble."

@amueller
Copy link
Member

Some tests and doctests are missing imports, see travis.

@amueller
Copy link
Member

Have you checked what happens in the multi-label case? If you don't want to implement that for the moment, please make sure an appropriate error message is raised.


Returns
-------
If not `weights=None`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct any more, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think this is valid numpy doc format ;)
Lastly, I think the you should flatten the array to be (n_samples, *) and not have ndim=3, so that can directly be used in a pipeline.

@amueller
Copy link
Member

your implementation currently assumes np.unique(y) == np.arange(n_classes) as far as I can see. You need a LabelEncoder or use self.classes_, y = np.unique(y, return_inverse) or something like that.


"""
def __init__(self, clfs, voting='hard', weights=None):
self.clfs = clfs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use estimators instead of clfs for consistency with the rest of the scikit.

@mblondel
Copy link
Member

I would rather use the same technique as for SVC
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py#L547

I actually don't agree with @larsmans 's inline comment: introducing two estimators just for that sounds kind of silly.

@larsmans
Copy link
Member

I still think that's the way to keep the codebase readable and maintainable. The SVC.predict_proba trick changes the type of an object after it's constructed, and it means that objects may not (appear to) have a method that the class does have.

I do prefer it to a NotImplementedError, though; that would be yet another convention to check for in meta-estimators.

@rasbt
Copy link
Contributor Author

rasbt commented Apr 21, 2015

Okay, so we ignore the part

Do not use this in new code; when
# probabilities are not available depending on a setting, introduce two
# estimators.

for now?
I am just wondering if we need the probability parameter. Can't we just check if self.voting=='hard';

instead of

    if not self.probability:
        raise AttributeError("predict_proba is not available when"
                         " probability=%r" % self.probability)

@agramfort
Copy link
Member

agramfort commented Apr 21, 2015 via email

@amueller
Copy link
Member

yeah I don't think we need a probability parameter, and I agree with using the SVC trick. It seems quite more harmless than the now abundant if_delegate_has_method.

@rasbt
Copy link
Contributor Author

rasbt commented Apr 22, 2015

Have to admit that I don't really see through this "property" trick. How would the @property decorator come into play here if we simply check whether the VotingClassifier was initialized for hard or soft voting.

@amueller
Copy link
Member

the code inside the @Property should raise an AttributeError if voting="hard" and should return the the function that actually implements predict_proba otherwise.

A property triggers a function evaluation, but looks like an attribute access. So trying to access the predict_proba function will actually run the function.

@rasbt
Copy link
Contributor Author

rasbt commented Apr 22, 2015

Thanks. I hope the most recent commit implements what you were looking for.

@rasbt
Copy link
Contributor Author

rasbt commented May 1, 2015

Before this pull request sets on some dust again, I was wondering if everything seems to be fine now!?

@amueller
Copy link
Member

amueller commented May 1, 2015

+1. Thanks for the ping.

@amueller amueller changed the title [MRG] EnsembleClassifier implementation [MRG + 1] EnsembleClassifier implementation May 1, 2015
avg : array-like, shape = [n_samples, n_classes]
Weighted average probability for each class per sample.
"""
self._check_proba()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could inline this (the function is only called once) but... well.. however you like..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yes, I also initially thought that this could be a little bit of an overkill ... but maybe it's slightly better this way in case there will be a more general check_proba function in some distant future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cough but really I don't have a strong opinion ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, okay, that's strong enough :)

@glouppe
Copy link
Contributor

glouppe commented May 8, 2015

I would have some cosmetic changes to do, but nothing significant. +1 for merge before it stalls.

@glouppe glouppe changed the title [MRG + 1] EnsembleClassifier implementation [MRG + 2] EnsembleClassifier implementation May 8, 2015
agramfort added a commit that referenced this pull request May 8, 2015
[MRG + 2] EnsembleClassifier implementation
@agramfort agramfort merged commit 9eb1964 into scikit-learn:master May 8, 2015
@agramfort
Copy link
Member

thanks @rasbt !

@amueller
Copy link
Member

amueller commented May 8, 2015

@glouppe what changes? Do you want to open an issue or open a PR?

@rasbt
Copy link
Contributor Author

rasbt commented May 8, 2015

Just saw that it finally got merged! :) Thank you all for the great help during the implementation. Since I am a big fan and frequent user of scikit-learn, it's really nice contribute, and I really learned a few great lessons here!

@glouppe
Copy link
Contributor

glouppe commented May 8, 2015

Thanks for your contribution!

@amueller I did make some changes already in a following commit. Only
cosmetics though
Le 8 mai 2015 20:13, "Sebastian Raschka" notifications@github.com a
écrit :

Just saw that it finally got merged! :) Thank you all for the great help
during the implementation. Since I am a big fan and frequent user of
scikit-learn, it's really nice contribute, and I really learned a few great
lessons here!


Reply to this email directly or view it on GitHub
#4161 (comment)
.

@amueller
Copy link
Member

amueller commented May 8, 2015

@rasbt thanks for the contribution and your patience!

@jnothman
Copy link
Member

jnothman commented Jun 6, 2015

It looks to me like the intro to ensembles (top of http://scikit-learn.org/dev/modules/ensemble.html) should be updated

@rasbt
Copy link
Contributor Author

rasbt commented Jun 11, 2015

@jnothman I agree. Although the VotingClassifier approach is different from bagging, I'd still put the former into the "averaging methods" category.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.