[MRG+1] Change VotingClassifier estimators by set_params #7674

yl565 · 2016-10-14T22:31:07Z

PR to #7288. Continuation of #7484

Use _BaseComposition as base

jnothman · 2016-10-15T14:02:39Z

You shouldn't need to open a new PR for a rebase. I'll try take a look at this over the coming weeks. Sorry for the slow reviews

yl565 · 2016-10-17T23:08:26Z

@jnothman, I have just started using git, do you mind letting me know if the following procedure is correct for a rebase (assume origin is already up to date with upstream/master):

git checkout mybranch
git rebase master
git push origin master

amueller · 2016-10-17T23:13:20Z

the last should be git push origin mybranch.
Also see : http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#rebasing-on-master

jnothman

Sorry this has taken a while to get to

jnothman · 2016-11-09T09:33:19Z

sklearn/ensemble/voting_classifier.py

@@ -44,7 +43,8 @@ class VotingClassifier(BaseEstimator, ClassifierMixin, TransformerMixin):
    estimators : list of (string, estimator) tuples
        Invoking the ``fit`` method on the ``VotingClassifier`` will fit clones
        of those original estimators that will be stored in the class attribute
-        `self.estimators_`.
+        `self.estimators_`. An estimator can be set to `None` using


these should use double-backticks.

jnothman · 2016-11-09T09:33:21Z

sklearn/ensemble/voting_classifier.py

+        isnone = np.array([1 if clf is None else 0
+                           for _, clf in self.estimators])
+        if isnone.sum() == len(self.estimators):
+            raise ValueError('All estimators is None. At least one is required'


'is None' -> 'are None'

jnothman · 2016-11-09T09:33:23Z

sklearn/ensemble/voting_classifier.py

@@ -161,11 +166,19 @@ def fit(self, X, y, sample_weight=None):

        self.estimators_ = Parallel(n_jobs=self.n_jobs)(


This change to estimators_ needs to be documented under Attributes

jnothman · 2016-11-09T09:33:28Z

sklearn/ensemble/voting_classifier.py


        return self

+    @property
+    def _narej_weights(self):


What is narej?

short for rejecting NaN, may be changed to _rej_nan_weights to be more clear?

Confusing because there aren't any NaNs involved. How about _weights_not_none... or just remove this helper

jnothman · 2016-11-09T09:33:35Z

sklearn/utils/metaestimators.py


 __all__ = ['if_delegate_has_method']


+class _BaseComposition(six.with_metaclass(ABCMeta, BaseEstimator)):


You should be using this in Pipeline and FeatureUnion

Do you mean I should also modify Pipeline and FeatureUnion to use _BaseComposition?

jnothman · 2016-11-09T09:40:36Z

sklearn/ensemble/tests/test_voting_classifier.py

+    eclf2.set_params(voting='soft').fit(X, y)
+    assert_array_equal(eclf1.predict(X), eclf2.predict(X))
+    assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))
+    msg = ('All estimators is None. At least one is required'


jnothman · 2016-11-09T09:41:27Z

sklearn/ensemble/tests/test_voting_classifier.py

+    eclf1.set_params(voting='soft').fit(X, y)
+    eclf2.set_params(voting='soft').fit(X, y)
+    assert_array_equal(eclf1.predict(X), eclf2.predict(X))
+    assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))


Please test soft transform. The outputs should differ between the 0-weight and None variants, though...

1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes.

yl565 · 2016-11-13T21:47:30Z

@jnothman I've made the requested changes. Also added estimator name validation and tests.

jnothman

LGTM, thanks!

jnothman · 2016-11-16T08:08:35Z

sklearn/ensemble/tests/test_voting_classifier.py

+    eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)],
+                             voting='soft', weights=[1, 0.5])
+    eclf2.set_params(rf=None).fit(X1, y1)
+    assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]],


Hmm. Looking at this makes me wonder whether we should be multiplying the outputs by the weight. Not an issue for this PR.

Should we test the output of "hard-voting" and transform as well

jnothman · 2016-11-16T08:08:37Z

sklearn/ensemble/voting_classifier.py

        self.voting = voting
        self.weights = weights
        self.n_jobs = n_jobs

+    @property
+    def named_estimators(self):


I wish this didn't exist, but I know it's not your problem.

You wish the property didn't exist? Or that wasn't a property but a function?

I wish that we had not copied this bad design feature from pipeline!

jnothman · 2017-01-09T03:53:58Z

Looking for a reviewer...

jnothman · 2017-01-09T03:55:15Z

This also needs merging with updated master.

yl565 · 2017-01-21T17:44:32Z

@jnothman I merged it with updated master. Can you help me check why the appveyor build is cancelled?

jnothman · 2017-01-22T05:24:15Z

Build was cancelled because all our appveyors were stuck and were cancelled. Don't worry about it.

prcastro · 2017-02-09T13:54:38Z

Anything preventing this from getting merged?

jnothman · 2017-02-09T20:18:07Z

Anything preventing this from getting merged?

A backlog and low reviewer availability. We require two approvals.

jnothman · 2017-03-08T20:33:19Z

Feel like reviewing this, @lesteve? @raghavrv?

MechCoder

Just some minor comments related to testing.

MechCoder · 2017-04-03T20:39:35Z

sklearn/ensemble/tests/test_voting_classifier.py

+    eclf2.set_params(nb=clf2).fit(X, y)
+    assert_array_equal(eclf1.predict(X), eclf2.predict(X))
+    assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))
+


Can you directly check

assert_equal(eclf2.estimators[0][1].get_params(), clf1.get_params()) assert_equal(eclf2.estimators[1][1].get_params(), clf2.get_params())

(It might be possible that two different classifiers give the same predictions)

MechCoder · 2017-04-03T20:45:39Z

sklearn/ensemble/tests/test_voting_classifier.py

+                                         ('nb', clf3)],
+                             voting='hard', weights=[1, 1, 0.5])
+    eclf2.set_params(rf=None).fit(X, y)
+    assert_array_equal(eclf1.predict(X), eclf2.predict(X))


Can you also check eclf2.estimators_, eclf2.estimators and eclf2.get_params()?

I was suggesting to test the behaviour of eclf2.estimators, eclf2.estimators_ and ``eclf2.get_params()`.

assert_true(dict(eclf2.estimators)["rf"] is None) assert_true(len(eclf2.estimators_) == 2) assert_true(all([not isinstance(est, RandomForestClassifier) for est in eclf2.estimators_]) assert_true(eclf2.get_params()["rf"] is None)

MechCoder · 2017-04-03T21:06:16Z

sklearn/ensemble/tests/test_voting_classifier.py

+    eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)],
+                             voting='soft', weights=[1, 0.5])
+    eclf2.set_params(rf=None).fit(X1, y1)
+    assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]],


Should we test the output of "hard-voting" and transform as well

MechCoder · 2017-04-03T21:34:15Z

sklearn/ensemble/tests/test_voting_classifier.py

+    eclf1.set_params(lr__C=10.0)
+    eclf2.set_params(nb__max_depth=5)
+
+    assert_true(eclf1.estimators[0][1].get_params()['C'] == 10.0)


Should we also test the get_params() interface of the VotingClassifier directly?. More specifically, eclf1.get_params()["lr__C"], eclf1.get_params()["lr"].get_params("C")? The get_params() interface seems untested.

MechCoder · 2017-04-03T21:39:59Z

sklearn/ensemble/voting_classifier.py

+        names, clfs = zip(*self.estimators)
+        self._validate_names(names)
+
+        isnone = np.array([1 if clf is None else 0


nitpick:

isnone = np.sum([clf is None for _, clf in self.estimators])

MechCoder · 2017-04-04T04:18:17Z

Extremely sorry for the long wait @yl565 ! There seems to be two additions in this PR:

Ability to substitute or set a classifier directly using the set_params interface.
Ability to set the classifier to None or disabling it using set_params.

Given that in master, both these fail silently, should we document these as new features or bug-fixes?

jnothman · 2017-04-04T07:34:40Z

Likely they don't just fail silently, but set an attribute on the estimator, even overwriting existing attributes! We had the same for Pipeline until 0.18. But I still think, in this case, it's better regarded as an enhancement.

…

On 4 April 2017 at 14:18, Manoj Kumar ***@***.***> wrote: Extremely sorry for the long wait @yl565 <https://github.com/yl565> ! There seems to be two additions in this PR: 1. Ability to substitute or set a classifier directly using the set_params interface. 2. Ability to set the classifier to None or disabling it using set_params. Given that in master, both these fail silently, should we document these as new features or bug-fixes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7674 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz618f7dT7XLeBL9qTnutuvLH1LBt2ks5rscSLgaJpZM4KXgCa> .

MechCoder · 2017-04-04T18:18:55Z

Ah I see why, in that case we should also test that

vclf.set_params(nb=clf2)
assert_false(hasattr(vclf, "nb")

Also, a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once.

MechCoder · 2017-04-04T18:20:00Z

sklearn/ensemble/voting_classifier.py

@@ -252,17 +270,13 @@ def transform(self, X):
        else:
            return self._predict(X)

+    def set_params(self, **params):


Can you document this?

MechCoder · 2017-04-04T18:20:26Z

sklearn/ensemble/voting_classifier.py

-                for key, value in six.iteritems(step.get_params(deep=True)):
-                    out['%s__%s' % (name, key)] = value
-            return out
+        return super(VotingClassifier,


Can you document this?

yl565 · 2017-04-05T19:21:11Z

@MechCoder I added the tests and documentation. Could you please explain more on the following two of your comments?

Can you also check eclf2.estimators_, eclf2.estimators and eclf2.get_params()

Also,a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once.

MechCoder · 2017-04-07T19:08:47Z

sklearn/ensemble/tests/test_voting_classifier.py

+                                         ('nb', clf3)],
+                             voting='hard', weights=[1, 1, 0.5])
+    eclf2.set_params(rf=None).fit(X, y)
+    assert_array_equal(eclf1.predict(X), eclf2.predict(X))


I was suggesting to test the behaviour of eclf2.estimators, eclf2.estimators_ and ``eclf2.get_params()`.

assert_true(dict(eclf2.estimators)["rf"] is None) assert_true(len(eclf2.estimators_) == 2) assert_true(all([not isinstance(est, RandomForestClassifier) for est in eclf2.estimators_]) assert_true(eclf2.get_params()["rf"] is None)

MechCoder · 2017-04-07T19:22:51Z

sklearn/ensemble/voting_classifier.py

+    def set_params(self, **params):
+        """ Setting the parameters for the voting classifier
+
+        Valid parameter keys can be listed with get_params().


Can you add under the get_params heading? I would just say "Get the parameters of the VotingClassifier". In addition, I would also document the parameter deep saying that setting it to True gets the various classifiers and the parameters of the classifiers as well.

MechCoder · 2017-04-07T19:23:36Z

sklearn/ensemble/voting_classifier.py

+            Specific parameters using e.g. set_params(parameter_name=new_value)
+            Estimators can be removed by setting them to None. In the following
+            example, the RandomForestClassifier is removed:
+            clf1 = LogisticRegression()


I am fairly sure that you have to add this under a side-heading "Examples" so that it renders properly

MechCoder · 2017-04-07T19:32:57Z

sklearn/ensemble/voting_classifier.py

+        Parameters
+        ----------
+        params: keyword arguments
+            Specific parameters using e.g. set_params(parameter_name=new_value)


In addition, to setting the parameters of the VotingClassifier, (with doubleticks) the individual classifiers of the VotingClassifier can also be set or replaced by setting them to None.

MechCoder · 2017-04-07T19:36:54Z

That is it from me! Please add a whatsnew under Enhancements and rebase properly so that I can merge.

MechCoder · 2017-04-07T19:52:29Z

You just need to do an interactive rebase

git rebase -i master

It will spit out errors wherever there are merge conflicts. You would need to check wherever the following block is present.

<<<<<<< HEAD
some code
========
some code
>>>>>>>>

and decide how you would like to merge the two blocks. Then you do

git add name_of_file
git rebase --continue

and keep continuing.

jnothman · 2017-04-08T09:54:30Z

No, just do git merge upstream/master if you can avoid a rebase. Much easier.

…

On 8 April 2017 at 05:52, Manoj Kumar ***@***.***> wrote: You just need to do an interactive rebase git rebase -i master It will spit out errors wherever there are merge conflicts. You would need to check wherever the following block is present. <<<<<<< HEAD some code ======== some code >>>>>>>> and decide how you would like to merge the two blocks. Then you do git add name_of_file git rebase --continue and keep continuing. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7674 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_Z2GITkewQ9le1D3MgleZzJpC23ks5rtpP-gaJpZM4KXgCa> .

MechCoder · 2017-04-10T17:39:59Z

Travis is failing because of some cosmetic reasons. Could you fix that?

yl565 · 2017-04-10T18:57:46Z

Thanks @MechCoder

MechCoder · 2017-04-10T19:05:05Z

Thanks you @yl565 !!

…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines

prcastro · 2017-04-20T00:05:41Z

🎉

…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines

yl565 added 2 commits October 14, 2016 18:26

PR to 7288

eb36ac1

Use _BaseComposition as base

Fix flakes problem

fca44db

jnothman added the Waiting for Reviewer label Oct 15, 2016

yl565 changed the title ~~Change VotingClassifier estimators by set_params~~ [MRG] Change VotingClassifier estimators by set_params Oct 17, 2016

jnothman requested changes Nov 9, 2016

View reviewed changes

jnothman approved these changes Nov 16, 2016

View reviewed changes

jnothman changed the title ~~[MRG] Change VotingClassifier estimators by set_params~~ [MRG+1] Change VotingClassifier estimators by set_params Nov 16, 2016

This was referenced Nov 17, 2016

[WIP] Add new feature StackingClassifier #7427

Closed

Add stacking-meta-model #6674

Closed

yl565 added 2 commits January 15, 2017 16:11

Merge branch 'master' into set_params_VC_7288

30149c2

Remove the unused import warn

cde2412

MechCoder reviewed Apr 4, 2017

View reviewed changes

Add more test and documentation

655e48c

resolve conflict with master

967726a

MechCoder reviewed Apr 7, 2017

View reviewed changes

yl565 added 4 commits April 10, 2017 11:57

Add testing cases and modify documentation

ef913a9

Merge remote-tracking branch 'upstream/master' into set_params_VC_7288

6388185

Merge branch 'master' into set_params_VC_7288

6169a91

Add to whats_new.rst

c063d0e

Fix too many blank lines

97969dd

MechCoder approved these changes Apr 10, 2017

View reviewed changes

MechCoder merged commit 194c231 into scikit-learn:master Apr 10, 2017

MechCoder mentioned this pull request Apr 10, 2017

Set estimators of VotingClassifier as parameters #7288

Closed

jnothman mentioned this pull request Apr 12, 2017

Illegal Tuple Names for Voting Classifier #8727

Closed

glemaitre mentioned this pull request May 5, 2017

[MRG+2] ENH add memory to make_pipeline #8831

Merged

glemaitre mentioned this pull request May 4, 2019

[MRG] API use drop a sentinel to disable estimators in voting #13780

Merged

		@@ -161,11 +166,19 @@ def fit(self, X, y, sample_weight=None):

		self.estimators_ = Parallel(n_jobs=self.n_jobs)(


		__all__ = ['if_delegate_has_method']


		class _BaseComposition(six.with_metaclass(ABCMeta, BaseEstimator)):

Uh oh!

[MRG+1] Change VotingClassifier estimators by set_params #7674

[MRG+1] Change VotingClassifier estimators by set_params #7674

Uh oh!

Conversation

yl565 commented Oct 14, 2016

Uh oh!

jnothman commented Oct 15, 2016

Uh oh!

yl565 commented Oct 17, 2016

Uh oh!

amueller commented Oct 17, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yl565 commented Nov 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jan 9, 2017

Uh oh!

jnothman commented Jan 9, 2017

Uh oh!

yl565 commented Jan 21, 2017

Uh oh!

jnothman commented Jan 22, 2017

Uh oh!

prcastro commented Feb 9, 2017

Uh oh!

jnothman commented Feb 9, 2017

Uh oh!

jnothman commented Mar 8, 2017

Uh oh!

MechCoder left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

yl565 commented Nov 13, 2016 •

edited

Loading

MechCoder Apr 7, 2017 •

edited

Loading