[MRG + 2] EnsembleClassifier implementation #4161

rasbt · 2015-01-26T05:39:41Z

This is an implementation of an estimator to combine scikit-learn classification estimators into an ensemble classifier for majority rule class label prediction or classification based on weighted probability averages.

I added a section at the bottom of the ensemble.rst documentation with a more detailed description and examples.

I'd be happy to make improvements or adjustments if necessary. Please let me know, I am happy to help!

jnothman · 2015-01-26T05:45:44Z

I think EnsembleClassifier is probably a bit too general. Perhaps VotedClassifier or VotingClassifier or MajorityRuleClassifier.

rasbt · 2015-01-26T05:57:40Z

I agree, Joel. I like MajorityRuleClassifier, it pretty much summarizes the concept in one phrase.

jnothman · 2015-01-26T08:39:54Z

except that often you will be relying on the highest vote, not a strict
majority

On 26 January 2015 at 16:57, Sebastian Raschka notifications@github.com
wrote:

I agree, Joel. I like MajorityRuleClassifier, it's pretty much
summarizes the concept in one phrase.

—
Reply to this email directly or view it on GitHub
#4161 (comment)
.

jnothman · 2015-01-26T08:40:01Z

Not that I've read the exact algorithm.

On 26 January 2015 at 19:39, Joel Nothman joel.nothman@sydney.edu.au
wrote:

except that often you will be relying on the highest vote, not a strict
majority

On 26 January 2015 at 16:57, Sebastian Raschka notifications@github.com
wrote:

I agree, Joel. I like MajorityRuleClassifier, it's pretty much
summarizes the concept in one phrase.

—
Reply to this email directly or view it on GitHub
#4161 (comment)
.

raghavrv · 2015-01-26T10:34:07Z

sklearn/ensemble/majorityrule_classifier.py

+        """
+        if self.weights:
+            avg = self.predict_proba(X)
+            maj = np.apply_along_axis(lambda x: max(enumerate(x), key=operator.itemgetter(1))[0], axis=1, arr=avg)


Could you refactor this to two lines maybe? Or perhaps replace it with the below...

maj = np.apply_along_axis(lambda x: max(enumerate(x), key=operator.itemgetter(1))[0], axis=1, arr=avg)

( PEP8 specifications say that max line length should be 79... )

glouppe · 2015-01-26T12:00:09Z

The algorithm you have implemented is actually called "soft voting" rather than "majority voting". In the former, the prediction is the argmax of the sums of predict probas, in the latter the prediction is the argmax of the sums of predictions.

By the way, a more general algorithm is "stacking", which basically learns a new estimator from the predictions of others. Shouldn't we try to go in that direction instead, with majority voting/soft voting as special cases?

arjoly · 2015-01-26T13:58:02Z

By the way, a more general algorithm is "stacking", which basically learns a new estimator from the predictions of others. Shouldn't we try to go in that direction instead, with majority voting/soft voting as special cases?

+1

amueller · 2015-01-26T15:39:13Z

It seemed to me that stacking was less used in practice, but Owen (from data bricks, Kaggle #1) said he uses it quite a bit.
I think it would be nice to have stacking but the simple case is also helpful, right? So why not implement this first, and possibly add stacking later? The implementation is pretty simple anyhow...

rasbt · 2015-01-26T17:42:19Z

The algorithm you have implemented is actually called "soft voting" rather than "majority voting". In the former, the prediction is the argmax of the sums of predict probas, in the latter the prediction is the argmax of the sums of predictions.

Hm, sounds like that implemented both at once then. If weights are provided, it is a "soft voting" (based on probas) and without the weights "majority voting" (based on class labels)

Do you think it would make sense to split the current implementation into a SoftVotingClassifier and a MajorityRuleClassifier for clarity?

About "stacking": I agree with @aemuller that this could be implemented additionally later. E.g., by using the attributes from the MajorityRuleClassifier as input for a new classifier. On the other hand, would this need an implementation? Maybe an example would be sufficient that shows how the returned predictions/probas from the MajorityRule/SoftVoting classifier can be used as input for a new classifier...

amueller · 2015-01-26T19:02:40Z

@rasbt in stacking (see ESL) the probabilities of the first stage of classifiers are obtained using out of sample estimates using cross-validation, so it is a bit more complex.

amueller · 2015-01-26T19:05:01Z

I wouldn't split this up as two classes, but I would add a different string options for using majority or soft voting. Maybe call it VotingClassifier and add a voting parameter that can be hard, soft or a sequence of weights? Not sure if the name voting is very discoverable, weights seems more natural.

amueller · 2015-01-26T19:05:50Z

With you current interface it would also be possible to just add a uniform keyword, which might be simpler than specifying a constant vector.

rasbt · 2015-01-26T20:52:58Z

@amueller Good idea! I added the voting parameter which can be set to 'soft' or 'hard'.

About the uniform keyword: I don't think it is necessary. Right now, uniform weights are used by default if weights=None, which is maybe cleaner because np.average also has a weights parameter.
avg = np.average(self.probas_, axis=0, weights=self.weights)
Or would you think that an explicit 'uniform' keyword could be beneficial for clarity?

amueller · 2015-01-26T21:49:45Z

sklearn/ensemble/voting_classifier.py

+        self.voting = voting
+
+        if weights and len(weights) != len(clfs):
+            raise ValueError('Number of classifiers and weights must be equal')      


maybe also give the numbers here.

amueller · 2015-01-26T22:00:16Z

I don't see where the weights are used in the prediction currently. If you let the user specify both the weights and the voting, then you need to make sure that the user doesn't shoot herself in the foot easily. So if voting='hard' and some weights are specified, they should also be used, I guess for weighted voting.

I don't think you need to add an additional keyword uniform. It is just that a user should be able to use both, hard and soft voting, without having to specify the number of classes in the estimator construction. I think that is possible with your current solution.

Something that might be tricky is that predict_proba should be consistent with predict in all cases. For hard voting that causes some minor issues, see #3891. You could just return the normalized votes but I am not a fan of that, as you throw away so much information (and your roc score will be abysmal).

amueller · 2015-01-26T22:01:30Z

sklearn/ensemble/voting_classifier.py

+    Parameters
+    ----------
+    clfs : array-like, shape = [n_classifiers]
+      A list of scikit-learn classifier objects.


Same comment about redundancy.

amueller · 2015-01-26T22:02:32Z

You should add the class in sklearn.utils.testing to DONT_TEST I think.

rasbt · 2015-01-26T23:47:58Z

@amueller Thanks for the comments, I think/hope I've addressed all of your points except for the "predict_proba should be consistent" part.

I don't see where the weights are used in the prediction currently. If you let the user specify both the weights and the voting, then you need to make sure that the user doesn't shoot herself in the foot easily. So if voting='hard' and some weights are specified, they should also be used, I guess for weighted voting.

The weights are being used now for the "hard" voting in the most recent update. I didn't implement it earlier because I really wouldn't want to use something like this but prefer the weighted avg. probabilities instead. But you are right about the "shooting in the foot" ;)

Something that might be tricky is that predict_proba should be consistent with predict in all cases. For hard voting that causes some minor issues, see #3891. You could just return the normalized votes but I am not a fan of that, as you throw away so much information (and your roc score will be abysmal).

Here, the challenge was that not all classifiers have the predict_proba (I think SVMs?); doing it like this way just allows the usage of "hard" voting if such classifiers are being used in the "ensemble."

amueller · 2015-01-27T00:23:01Z

Some tests and doctests are missing imports, see travis.

amueller · 2015-01-27T00:26:01Z

Have you checked what happens in the multi-label case? If you don't want to implement that for the moment, please make sure an appropriate error message is raised.

amueller · 2015-01-27T00:26:36Z

sklearn/ensemble/voting_classifier.py

+
+        Returns
+        -------
+        If not `weights=None`:


this is not correct any more, right?

Also, I don't think this is valid numpy doc format ;)
Lastly, I think the you should flatten the array to be (n_samples, *) and not have ndim=3, so that can directly be used in a pipeline.

amueller · 2015-01-27T00:32:28Z

your implementation currently assumes np.unique(y) == np.arange(n_classes) as far as I can see. You need a LabelEncoder or use self.classes_, y = np.unique(y, return_inverse) or something like that.

mblondel · 2015-01-27T02:11:22Z

sklearn/ensemble/voting_classifier.py

+
+    """
+    def __init__(self, clfs, voting='hard', weights=None):
+        self.clfs = clfs


Use estimators instead of clfs for consistency with the rest of the scikit.

mblondel · 2015-04-21T01:30:41Z

I would rather use the same technique as for SVC
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py#L547

I actually don't agree with @larsmans 's inline comment: introducing two estimators just for that sounds kind of silly.

larsmans · 2015-04-21T06:22:12Z

I still think that's the way to keep the codebase readable and maintainable. The SVC.predict_proba trick changes the type of an object after it's constructed, and it means that objects may not (appear to) have a method that the class does have.

I do prefer it to a NotImplementedError, though; that would be yet another convention to check for in meta-estimators.

rasbt · 2015-04-21T16:04:53Z

Okay, so we ignore the part

Do not use this in new code; when
# probabilities are not available depending on a setting, introduce two
# estimators.

for now?
I am just wondering if we need the probability parameter. Can't we just check if self.voting=='hard';

instead of

    if not self.probability:
        raise AttributeError("predict_proba is not available when"
                         " probability=%r" % self.probability)

agramfort · 2015-04-21T20:20:06Z

seems legit..

amueller · 2015-04-22T15:46:29Z

yeah I don't think we need a probability parameter, and I agree with using the SVC trick. It seems quite more harmless than the now abundant if_delegate_has_method.

rasbt · 2015-04-22T16:27:52Z

Have to admit that I don't really see through this "property" trick. How would the @property decorator come into play here if we simply check whether the VotingClassifier was initialized for hard or soft voting.

amueller · 2015-04-22T17:30:50Z

the code inside the @Property should raise an AttributeError if voting="hard" and should return the the function that actually implements predict_proba otherwise.

A property triggers a function evaluation, but looks like an attribute access. So trying to access the predict_proba function will actually run the function.

rasbt · 2015-04-22T18:31:47Z

Thanks. I hope the most recent commit implements what you were looking for.

rasbt · 2015-05-01T15:47:34Z

Before this pull request sets on some dust again, I was wondering if everything seems to be fine now!?

amueller · 2015-05-01T16:27:32Z

+1. Thanks for the ping.

amueller · 2015-05-01T16:28:21Z

sklearn/ensemble/voting_classifier.py

+        avg : array-like, shape = [n_samples, n_classes]
+            Weighted average probability for each class per sample.
+        """
+        self._check_proba()


You could inline this (the function is only called once) but... well.. however you like..

Hm, yes, I also initially thought that this could be a little bit of an overkill ... but maybe it's slightly better this way in case there will be a more general check_proba function in some distant future

cough but really I don't have a strong opinion ;)

Okay, okay, that's strong enough :)

glouppe · 2015-05-08T06:21:28Z

I would have some cosmetic changes to do, but nothing significant. +1 for merge before it stalls.

[MRG + 2] EnsembleClassifier implementation

agramfort · 2015-05-08T07:21:41Z

thanks @rasbt !

amueller · 2015-05-08T14:42:57Z

@glouppe what changes? Do you want to open an issue or open a PR?

rasbt · 2015-05-08T18:13:33Z

Just saw that it finally got merged! :) Thank you all for the great help during the implementation. Since I am a big fan and frequent user of scikit-learn, it's really nice contribute, and I really learned a few great lessons here!

glouppe · 2015-05-08T18:18:43Z

Thanks for your contribution!

@amueller I did make some changes already in a following commit. Only
cosmetics though
Le 8 mai 2015 20:13, "Sebastian Raschka" notifications@github.com a
écrit :

Just saw that it finally got merged! :) Thank you all for the great help
during the implementation. Since I am a big fan and frequent user of
scikit-learn, it's really nice contribute, and I really learned a few great
lessons here!

—
Reply to this email directly or view it on GitHub
#4161 (comment)
.

amueller · 2015-05-08T19:05:42Z

@rasbt thanks for the contribution and your patience!

jnothman · 2015-06-06T10:22:41Z

It looks to me like the intro to ensembles (top of http://scikit-learn.org/dev/modules/ensemble.html) should be updated

rasbt · 2015-06-11T19:43:43Z

@jnothman I agree. Although the VotingClassifier approach is different from bagging, I'd still put the former into the "averaging methods" category.

raghavrv reviewed Jan 26, 2015
View reviewed changes

amueller reviewed Jan 26, 2015
View reviewed changes

amueller reviewed Jan 27, 2015
View reviewed changes

mblondel reviewed Jan 27, 2015
View reviewed changes

votingclassifier

2910198

added _check_proba method

24f88b8

property trick to raise AttributeError if voting is "hard"

00abe26

amueller changed the title ~~[MRG] EnsembleClassifier implementation~~ [MRG + 1] EnsembleClassifier implementation May 1, 2015

amueller reviewed May 1, 2015
View reviewed changes

inline check_proba

0007660

glouppe changed the title ~~[MRG + 1] EnsembleClassifier implementation~~ [MRG + 2] EnsembleClassifier implementation May 8, 2015

agramfort added a commit that referenced this pull request May 8, 2015

Merge pull request #4161 from rasbt/ensemble-classifier

9eb1964

[MRG + 2] EnsembleClassifier implementation

agramfort merged commit 9eb1964 into scikit-learn:master May 8, 2015

amueller mentioned this pull request Jun 4, 2015

implementing stacking and other ensemble techniques #4816

Closed

amueller mentioned this pull request Sep 9, 2016

Add prefit to VotingClassifier #7382

Open

GaelVaroquaux mentioned this pull request Oct 27, 2016

GaussianMixture does not include transform function #7743

Closed

Uh oh!

[MRG + 2] EnsembleClassifier implementation #4161

[MRG + 2] EnsembleClassifier implementation #4161

Uh oh!

Conversation

rasbt commented Jan 26, 2015

Uh oh!

jnothman commented Jan 26, 2015

Uh oh!

rasbt commented Jan 26, 2015

Uh oh!

jnothman commented Jan 26, 2015

Uh oh!

jnothman commented Jan 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glouppe commented Jan 26, 2015

Uh oh!

arjoly commented Jan 26, 2015

Uh oh!

amueller commented Jan 26, 2015

Uh oh!

rasbt commented Jan 26, 2015

Uh oh!

amueller commented Jan 26, 2015

Uh oh!

amueller commented Jan 26, 2015

Uh oh!

amueller commented Jan 26, 2015

Uh oh!

rasbt commented Jan 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jan 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jan 26, 2015

Uh oh!

rasbt commented Jan 26, 2015

Uh oh!

amueller commented Jan 27, 2015

Uh oh!

amueller commented Jan 27, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jan 27, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mblondel commented Apr 21, 2015

Uh oh!

larsmans commented Apr 21, 2015

Uh oh!

rasbt commented Apr 21, 2015

Uh oh!

agramfort commented Apr 21, 2015 via email

Uh oh!

amueller commented Apr 22, 2015

Uh oh!

rasbt commented Apr 22, 2015

Uh oh!

amueller commented Apr 22, 2015

Uh oh!

rasbt commented Apr 22, 2015

Uh oh!

rasbt commented May 1, 2015

Uh oh!

amueller commented May 1, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment