-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Change VotingClassifier estimators by set_params #7674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use _BaseComposition as base
You shouldn't need to open a new PR for a rebase. I'll try take a look at this over the coming weeks. Sorry for the slow reviews |
@jnothman, I have just started using git, do you mind letting me know if the following procedure is correct for a rebase (assume origin is already up to date with upstream/master):
|
the last should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this has taken a while to get to
@@ -44,7 +43,8 @@ class VotingClassifier(BaseEstimator, ClassifierMixin, TransformerMixin): | |||
estimators : list of (string, estimator) tuples | |||
Invoking the ``fit`` method on the ``VotingClassifier`` will fit clones | |||
of those original estimators that will be stored in the class attribute | |||
`self.estimators_`. | |||
`self.estimators_`. An estimator can be set to `None` using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should use double-backticks.
isnone = np.array([1 if clf is None else 0 | ||
for _, clf in self.estimators]) | ||
if isnone.sum() == len(self.estimators): | ||
raise ValueError('All estimators is None. At least one is required' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'is None' -> 'are None'
@@ -161,11 +166,19 @@ def fit(self, X, y, sample_weight=None): | |||
|
|||
self.estimators_ = Parallel(n_jobs=self.n_jobs)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to estimators_
needs to be documented under Attributes
|
||
return self | ||
|
||
@property | ||
def _narej_weights(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is narej?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short for rejecting NaN, may be changed to _rej_nan_weights
to be more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing because there aren't any NaNs involved. How about _weights_not_none
... or just remove this helper
|
||
__all__ = ['if_delegate_has_method'] | ||
|
||
|
||
class _BaseComposition(six.with_metaclass(ABCMeta, BaseEstimator)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using this in Pipeline
and FeatureUnion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean I should also modify Pipeline and FeatureUnion to use _BaseComposition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
eclf2.set_params(voting='soft').fit(X, y) | ||
assert_array_equal(eclf1.predict(X), eclf2.predict(X)) | ||
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X)) | ||
msg = ('All estimators is None. At least one is required' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -> are
eclf1.set_params(voting='soft').fit(X, y) | ||
eclf2.set_params(voting='soft').fit(X, y) | ||
assert_array_equal(eclf1.predict(X), eclf2.predict(X)) | ||
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test soft transform
. The outputs should differ between the 0-weight and None variants, though...
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes.
@jnothman I've made the requested changes. Also added estimator name validation and tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)], | ||
voting='soft', weights=[1, 0.5]) | ||
eclf2.set_params(rf=None).fit(X1, y1) | ||
assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Looking at this makes me wonder whether we should be multiplying the outputs by the weight. Not an issue for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test the output of "hard-voting" and transform as well
self.voting = voting | ||
self.weights = weights | ||
self.n_jobs = n_jobs | ||
|
||
@property | ||
def named_estimators(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this didn't exist, but I know it's not your problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wish the property didn't exist? Or that wasn't a property but a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish that we had not copied this bad design feature from pipeline!
Looking for a reviewer... |
This also needs merging with updated master. |
@jnothman I merged it with updated master. Can you help me check why the appveyor build is cancelled? |
Build was cancelled because all our appveyors were stuck and were cancelled. Don't worry about it. |
Anything preventing this from getting merged? |
A backlog and low reviewer availability. We require two approvals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments related to testing.
eclf2.set_params(nb=clf2).fit(X, y) | ||
assert_array_equal(eclf1.predict(X), eclf2.predict(X)) | ||
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you directly check
assert_equal(eclf2.estimators[0][1].get_params(), clf1.get_params())
assert_equal(eclf2.estimators[1][1].get_params(), clf2.get_params())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It might be possible that two different classifiers give the same predictions)
('nb', clf3)], | ||
voting='hard', weights=[1, 1, 0.5]) | ||
eclf2.set_params(rf=None).fit(X, y) | ||
assert_array_equal(eclf1.predict(X), eclf2.predict(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check eclf2.estimators_
, eclf2.estimators
and eclf2.get_params()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting to test the behaviour of eclf2.estimators
, eclf2.estimators_
and ``eclf2.get_params()`.
assert_true(dict(eclf2.estimators)["rf"] is None)
assert_true(len(eclf2.estimators_) == 2)
assert_true(all([not isinstance(est, RandomForestClassifier) for est in eclf2.estimators_])
assert_true(eclf2.get_params()["rf"] is None)
eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)], | ||
voting='soft', weights=[1, 0.5]) | ||
eclf2.set_params(rf=None).fit(X1, y1) | ||
assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test the output of "hard-voting" and transform as well
eclf1.set_params(lr__C=10.0) | ||
eclf2.set_params(nb__max_depth=5) | ||
|
||
assert_true(eclf1.estimators[0][1].get_params()['C'] == 10.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also test the get_params()
interface of the VotingClassifier directly?. More specifically, eclf1.get_params()["lr__C"]
, eclf1.get_params()["lr"].get_params("C")
? The get_params()
interface seems untested.
names, clfs = zip(*self.estimators) | ||
self._validate_names(names) | ||
|
||
isnone = np.array([1 if clf is None else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
isnone = np.sum([clf is None for _, clf in self.estimators])
Extremely sorry for the long wait @yl565 ! There seems to be two additions in this PR:
Given that in master, both these fail silently, should we document these as new features or bug-fixes? |
Likely they don't just fail silently, but set an attribute on the
estimator, even overwriting existing attributes! We had the same for
Pipeline until 0.18. But I still think, in this case, it's better regarded
as an enhancement.
…On 4 April 2017 at 14:18, Manoj Kumar ***@***.***> wrote:
Extremely sorry for the long wait @yl565 <https://github.com/yl565> !
There seems to be two additions in this PR:
1. Ability to substitute or set a classifier directly using the
set_params interface.
2. Ability to set the classifier to None or disabling it using
set_params.
Given that in master, both these fail silently, should we document these
as new features or bug-fixes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7674 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz618f7dT7XLeBL9qTnutuvLH1LBt2ks5rscSLgaJpZM4KXgCa>
.
|
Ah I see why, in that case we should also test that vclf.set_params(nb=clf2)
assert_false(hasattr(vclf, "nb") Also, a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once. |
@@ -252,17 +270,13 @@ def transform(self, X): | |||
else: | |||
return self._predict(X) | |||
|
|||
def set_params(self, **params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this?
for key, value in six.iteritems(step.get_params(deep=True)): | ||
out['%s__%s' % (name, key)] = value | ||
return out | ||
return super(VotingClassifier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this?
@MechCoder I added the tests and documentation. Could you please explain more on the following two of your comments?
|
('nb', clf3)], | ||
voting='hard', weights=[1, 1, 0.5]) | ||
eclf2.set_params(rf=None).fit(X, y) | ||
assert_array_equal(eclf1.predict(X), eclf2.predict(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting to test the behaviour of eclf2.estimators
, eclf2.estimators_
and ``eclf2.get_params()`.
assert_true(dict(eclf2.estimators)["rf"] is None)
assert_true(len(eclf2.estimators_) == 2)
assert_true(all([not isinstance(est, RandomForestClassifier) for est in eclf2.estimators_])
assert_true(eclf2.get_params()["rf"] is None)
def set_params(self, **params): | ||
""" Setting the parameters for the voting classifier | ||
|
||
Valid parameter keys can be listed with get_params(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add under the get_params
heading? I would just say "Get the parameters of the VotingClassifier". In addition, I would also document the parameter deep saying that setting it to True gets the various classifiers and the parameters of the classifiers as well.
Specific parameters using e.g. set_params(parameter_name=new_value) | ||
Estimators can be removed by setting them to None. In the following | ||
example, the RandomForestClassifier is removed: | ||
clf1 = LogisticRegression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fairly sure that you have to add this under a side-heading "Examples" so that it renders properly
Parameters | ||
---------- | ||
params: keyword arguments | ||
Specific parameters using e.g. set_params(parameter_name=new_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, to setting the parameters of the VotingClassifier
, (with doubleticks) the individual classifiers of the VotingClassifier
can also be set or replaced by setting them to None.
That is it from me! Please add a whatsnew under Enhancements and rebase properly so that I can merge. |
You just need to do an interactive rebase
It will spit out errors wherever there are merge conflicts. You would need to check wherever the following block is present.
and decide how you would like to merge the two blocks. Then you do
and keep continuing. |
No, just do git merge upstream/master if you can avoid a rebase. Much
easier.
…On 8 April 2017 at 05:52, Manoj Kumar ***@***.***> wrote:
You just need to do an interactive rebase
git rebase -i master
It will spit out errors wherever there are merge conflicts. You would need
to check wherever the following block is present.
<<<<<<< HEAD
some code
========
some code
>>>>>>>>
and decide how you would like to merge the two blocks. Then you do
git add name_of_file
git rebase --continue
and keep continuing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7674 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_Z2GITkewQ9le1D3MgleZzJpC23ks5rtpP-gaJpZM4KXgCa>
.
|
Travis is failing because of some cosmetic reasons. Could you fix that? |
Thanks @MechCoder |
Thanks you @yl565 !! |
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
🎉 |
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
…cikit-learn#7674) * PR to 7288 Use _BaseComposition as base * Fix flakes problem * Change ``pipeline``, add more tests and other changes 1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion`` 2. Add tests of soft voting ``transform`` when one estimator is set to None 3. Add estimator name validation in ``_BaseComposition`` and tests 4. Other requested changes. * Remove the unused import warn * Add more test and documentation * resolve conflict with master * Add testing cases and modify documentation * Add to whats_new.rst * Fix too many blank lines
PR to #7288. Continuation of #7484