Skip to content

[MRG+1] Change VotingClassifier estimators by set_params #7674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 10, 2017

Conversation

yl565
Copy link
Contributor

@yl565 yl565 commented Oct 14, 2016

PR to #7288. Continuation of #7484

yl565 added 2 commits October 14, 2016 18:26
Use _BaseComposition as base
@jnothman
Copy link
Member

You shouldn't need to open a new PR for a rebase. I'll try take a look at this over the coming weeks. Sorry for the slow reviews

@yl565 yl565 changed the title Change VotingClassifier estimators by set_params [MRG] Change VotingClassifier estimators by set_params Oct 17, 2016
@yl565
Copy link
Contributor Author

yl565 commented Oct 17, 2016

@jnothman, I have just started using git, do you mind letting me know if the following procedure is correct for a rebase (assume origin is already up to date with upstream/master):

git checkout mybranch
git rebase master
git push origin master

@amueller
Copy link
Member

the last should be git push origin mybranch.
Also see : http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#rebasing-on-master

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this has taken a while to get to

@@ -44,7 +43,8 @@ class VotingClassifier(BaseEstimator, ClassifierMixin, TransformerMixin):
estimators : list of (string, estimator) tuples
Invoking the ``fit`` method on the ``VotingClassifier`` will fit clones
of those original estimators that will be stored in the class attribute
`self.estimators_`.
`self.estimators_`. An estimator can be set to `None` using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should use double-backticks.

isnone = np.array([1 if clf is None else 0
for _, clf in self.estimators])
if isnone.sum() == len(self.estimators):
raise ValueError('All estimators is None. At least one is required'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'is None' -> 'are None'

@@ -161,11 +166,19 @@ def fit(self, X, y, sample_weight=None):

self.estimators_ = Parallel(n_jobs=self.n_jobs)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to estimators_ needs to be documented under Attributes


return self

@property
def _narej_weights(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is narej?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short for rejecting NaN, may be changed to _rej_nan_weights to be more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing because there aren't any NaNs involved. How about _weights_not_none... or just remove this helper


__all__ = ['if_delegate_has_method']


class _BaseComposition(six.with_metaclass(ABCMeta, BaseEstimator)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be using this in Pipeline and FeatureUnion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean I should also modify Pipeline and FeatureUnion to use _BaseComposition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

eclf2.set_params(voting='soft').fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))
msg = ('All estimators is None. At least one is required'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -> are

eclf1.set_params(voting='soft').fit(X, y)
eclf2.set_params(voting='soft').fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test soft transform. The outputs should differ between the 0-weight and None variants, though...

1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.
@yl565
Copy link
Contributor Author

yl565 commented Nov 13, 2016

@jnothman I've made the requested changes. Also added estimator name validation and tests.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)],
voting='soft', weights=[1, 0.5])
eclf2.set_params(rf=None).fit(X1, y1)
assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Looking at this makes me wonder whether we should be multiplying the outputs by the weight. Not an issue for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test the output of "hard-voting" and transform as well

self.voting = voting
self.weights = weights
self.n_jobs = n_jobs

@property
def named_estimators(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this didn't exist, but I know it's not your problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wish the property didn't exist? Or that wasn't a property but a function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish that we had not copied this bad design feature from pipeline!

@jnothman jnothman changed the title [MRG] Change VotingClassifier estimators by set_params [MRG+1] Change VotingClassifier estimators by set_params Nov 16, 2016
@jnothman
Copy link
Member

jnothman commented Jan 9, 2017

Looking for a reviewer...

@jnothman
Copy link
Member

jnothman commented Jan 9, 2017

This also needs merging with updated master.

@yl565
Copy link
Contributor Author

yl565 commented Jan 21, 2017

@jnothman I merged it with updated master. Can you help me check why the appveyor build is cancelled?

@jnothman
Copy link
Member

Build was cancelled because all our appveyors were stuck and were cancelled. Don't worry about it.

@prcastro
Copy link

prcastro commented Feb 9, 2017

Anything preventing this from getting merged?

@jnothman
Copy link
Member

jnothman commented Feb 9, 2017

Anything preventing this from getting merged?

A backlog and low reviewer availability. We require two approvals.

@jnothman
Copy link
Member

jnothman commented Mar 8, 2017

Feel like reviewing this, @lesteve? @raghavrv?

Copy link
Member

@MechCoder MechCoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments related to testing.

eclf2.set_params(nb=clf2).fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you directly check

assert_equal(eclf2.estimators[0][1].get_params(), clf1.get_params())
assert_equal(eclf2.estimators[1][1].get_params(), clf2.get_params())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It might be possible that two different classifiers give the same predictions)

('nb', clf3)],
voting='hard', weights=[1, 1, 0.5])
eclf2.set_params(rf=None).fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check eclf2.estimators_, eclf2.estimators and eclf2.get_params()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting to test the behaviour of eclf2.estimators, eclf2.estimators_ and ``eclf2.get_params()`.

assert_true(dict(eclf2.estimators)["rf"] is None)
assert_true(len(eclf2.estimators_) == 2)
assert_true(all([not isinstance(est, RandomForestClassifier)  for est in eclf2.estimators_])
assert_true(eclf2.get_params()["rf"] is None)

eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)],
voting='soft', weights=[1, 0.5])
eclf2.set_params(rf=None).fit(X1, y1)
assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test the output of "hard-voting" and transform as well

eclf1.set_params(lr__C=10.0)
eclf2.set_params(nb__max_depth=5)

assert_true(eclf1.estimators[0][1].get_params()['C'] == 10.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test the get_params() interface of the VotingClassifier directly?. More specifically, eclf1.get_params()["lr__C"], eclf1.get_params()["lr"].get_params("C")? The get_params() interface seems untested.

names, clfs = zip(*self.estimators)
self._validate_names(names)

isnone = np.array([1 if clf is None else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:

isnone = np.sum([clf is None for _, clf in self.estimators])

@MechCoder
Copy link
Member

Extremely sorry for the long wait @yl565 ! There seems to be two additions in this PR:

  1. Ability to substitute or set a classifier directly using the set_params interface.
  2. Ability to set the classifier to None or disabling it using set_params.

Given that in master, both these fail silently, should we document these as new features or bug-fixes?

@jnothman
Copy link
Member

jnothman commented Apr 4, 2017 via email

@MechCoder
Copy link
Member

Ah I see why, in that case we should also test that

vclf.set_params(nb=clf2)
assert_false(hasattr(vclf, "nb")

Also, a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once.

@@ -252,17 +270,13 @@ def transform(self, X):
else:
return self._predict(X)

def set_params(self, **params):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this?

for key, value in six.iteritems(step.get_params(deep=True)):
out['%s__%s' % (name, key)] = value
return out
return super(VotingClassifier,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this?

@yl565
Copy link
Contributor Author

yl565 commented Apr 5, 2017

@MechCoder I added the tests and documentation. Could you please explain more on the following two of your comments?

Can you also check eclf2.estimators_, eclf2.estimators and eclf2.get_params()

Also,a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once.

('nb', clf3)],
voting='hard', weights=[1, 1, 0.5])
eclf2.set_params(rf=None).fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting to test the behaviour of eclf2.estimators, eclf2.estimators_ and ``eclf2.get_params()`.

assert_true(dict(eclf2.estimators)["rf"] is None)
assert_true(len(eclf2.estimators_) == 2)
assert_true(all([not isinstance(est, RandomForestClassifier)  for est in eclf2.estimators_])
assert_true(eclf2.get_params()["rf"] is None)

def set_params(self, **params):
""" Setting the parameters for the voting classifier

Valid parameter keys can be listed with get_params().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add under the get_params heading? I would just say "Get the parameters of the VotingClassifier". In addition, I would also document the parameter deep saying that setting it to True gets the various classifiers and the parameters of the classifiers as well.

Specific parameters using e.g. set_params(parameter_name=new_value)
Estimators can be removed by setting them to None. In the following
example, the RandomForestClassifier is removed:
clf1 = LogisticRegression()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly sure that you have to add this under a side-heading "Examples" so that it renders properly

Parameters
----------
params: keyword arguments
Specific parameters using e.g. set_params(parameter_name=new_value)
Copy link
Member

@MechCoder MechCoder Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, to setting the parameters of the VotingClassifier, (with doubleticks) the individual classifiers of the VotingClassifier can also be set or replaced by setting them to None.

@MechCoder
Copy link
Member

That is it from me! Please add a whatsnew under Enhancements and rebase properly so that I can merge.

@MechCoder
Copy link
Member

You just need to do an interactive rebase

git rebase -i master

It will spit out errors wherever there are merge conflicts. You would need to check wherever the following block is present.

<<<<<<< HEAD
some code
========
some code
>>>>>>>>

and decide how you would like to merge the two blocks. Then you do

git add name_of_file
git rebase --continue

and keep continuing.

@jnothman
Copy link
Member

jnothman commented Apr 8, 2017 via email

@MechCoder
Copy link
Member

Travis is failing because of some cosmetic reasons. Could you fix that?

@yl565
Copy link
Contributor Author

yl565 commented Apr 10, 2017

Thanks @MechCoder

@MechCoder MechCoder merged commit 194c231 into scikit-learn:master Apr 10, 2017
@MechCoder
Copy link
Member

Thanks you @yl565 !!

massich pushed a commit to massich/scikit-learn that referenced this pull request Apr 11, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
@prcastro
Copy link

🎉

massich pushed a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants