Skip to content

[MRG+1] Pass sample_weight as kwargs in VotingClassifier #9493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jschendel
Copy link
Contributor

Reference Issue

Fixes #9474

What does this implement/fix? Explain your changes.

Modified VotingClassifier to pass sample_weight as kwargs to estimators. Current behavior is to pass sample_weight as a positional arg, which is inconsistent with the rest of the codebase and could cause problems if a signature of an underlying estimator changes.

Any other comments?

First contribution to scikit-learn, let me know if there are any best practices that I'm not following.

@amueller
Copy link
Member

amueller commented Aug 4, 2017

Thanks, looks good. I'm not sure if it's overkill, but it might be nice to have a non-regression test with a mock estimator.

@jschendel jschendel force-pushed the votingclassifier-sample_weight-kwargs branch from dac3723 to 3b17b16 Compare August 5, 2017 00:12
@jschendel
Copy link
Contributor Author

jschendel commented Aug 5, 2017

@amueller : Added a non-regression test. Thanks, I originally wanted to write a test for this, but was unsure how to do so, or if one was even needed. Using a mock estimator is a good idea.

EDIT:
Looks like CI failed due to some syntax I used in my test which is invalid in Python 2. Specifically fit(self, X, y, *args, sample_weight=None) fails because *args must be placed after named parameters in Python 2 (this was relaxed in Python 3). The idea behind using *args was to eat any additional positionals that are passed, forcing sample_weight to be supplied by name.

Due to this check in VotingClassifier, the fit method of all underlying estimators must have sample_weight as a named parameter, so I can't just replace sample_weight=None with **kwargs. Using placeholders to eat an additional positionals, i.e. fit(self, X, y, _temp1=None, ..., _tempN=None, sample_weight=None), doesn't seem like a robust solution because if enough positionals are supplied one would eventually get assigned to sample_weight.

Any ideas on how I can verify that sample_weight is passed as kwargs in Python 2, while maintaining sample_weight as a named parameter?

@codecov
Copy link

codecov bot commented Aug 5, 2017

Codecov Report

Merging #9493 into master will decrease coverage by 0.41%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9493      +/-   ##
==========================================
- Coverage   96.16%   95.75%   -0.42%     
==========================================
  Files         335      335              
  Lines       61826    61833       +7     
==========================================
- Hits        59455    59206     -249     
- Misses       2371     2627     +256
Impacted Files Coverage Δ
sklearn/ensemble/voting_classifier.py 98.85% <100%> (ø) ⬆️
sklearn/ensemble/tests/test_voting_classifier.py 99.56% <100%> (+0.01%) ⬆️
sklearn/utils/_scipy_sparse_lsqr_backport.py 4.02% <0%> (-73.87%) ⬇️
sklearn/utils/fixes.py 42.34% <0%> (-38.74%) ⬇️
sklearn/_build_utils/__init__.py 55.1% <0%> (-6.13%) ⬇️
sklearn/utils/tests/test_deprecation.py 96.77% <0%> (-3.23%) ⬇️
sklearn/datasets/mldata.py 93.75% <0%> (-2.5%) ⬇️
sklearn/datasets/tests/test_svmlight_format.py 98.4% <0%> (-1.6%) ⬇️
sklearn/utils/tests/test_estimator_checks.py 95.36% <0%> (-1.33%) ⬇️
sklearn/manifold/spectral_embedding_.py 83.66% <0%> (-1.31%) ⬇️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00358b1...5309384. Read the comment docs.

@jschendel
Copy link
Contributor Author

I was able to get around the SyntaxError by using fit(self, X, y, *args, **sample_weight), which passes the check by VotingClassifier to verify that the fit function of underlying estimators supports sample weights. The modified test fails on master and passes with my changes.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. And thanks for the non-regression test.

@jnothman jnothman changed the title [MRG] Pass sample_weight as kwargs in VotingClassifier [MRG+1] Pass sample_weight as kwargs in VotingClassifier Aug 5, 2017
@jnothman jnothman merged commit 0bfe10d into scikit-learn:master Aug 5, 2017
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
@jschendel jschendel deleted the votingclassifier-sample_weight-kwargs branch August 7, 2017 17:46
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Aug 12, 2017
Release 0.19.0

* tag '0.19.0': (99 commits)
  DOC one more version issue in doc
  skip docstring tests because not useful to users and has some issues
  deprecation of n_components happened in 0.19 not 0.18 (scikit-learn#9527)
  sync whatsnew with master so I'm less confused
  DOC more navigation links
  DOC a note on data leakage and pipeline (scikit-learn#9510)
  DOC set release date to Friday
  DOC Update news and menu for 0.19 release
  DOC list of contributors to 0.19
  DOC Change release date to Thursday
  DOC Remove some whitespace from what's new
  Update what's new for recent changes
  Use base.is_classifier instead instead of isinstance (scikit-learn#9482)
  Fix safe_indexing with read-only indices (scikit-learn#9507)
  [MRG+1] add scorer based on explained_variance_score (scikit-learn#9259)
  fix wrong assert in test_validation (scikit-learn#9480)
  [MRG+1] FIX Add missing mixins to ClassifierChain (scikit-learn#9473)
  Bring last code block in line with the image. (scikit-learn#9488)
  FIX Pass sample_weight as kwargs in VotingClassifier (scikit-learn#9493)
  FIX Incorrent implementation of noise_variance_ in PCA._fit_truncated (scikit-learn#9108)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Aug 12, 2017
* releases: (99 commits)
  DOC one more version issue in doc
  skip docstring tests because not useful to users and has some issues
  deprecation of n_components happened in 0.19 not 0.18 (scikit-learn#9527)
  sync whatsnew with master so I'm less confused
  DOC more navigation links
  DOC a note on data leakage and pipeline (scikit-learn#9510)
  DOC set release date to Friday
  DOC Update news and menu for 0.19 release
  DOC list of contributors to 0.19
  DOC Change release date to Thursday
  DOC Remove some whitespace from what's new
  Update what's new for recent changes
  Use base.is_classifier instead instead of isinstance (scikit-learn#9482)
  Fix safe_indexing with read-only indices (scikit-learn#9507)
  [MRG+1] add scorer based on explained_variance_score (scikit-learn#9259)
  fix wrong assert in test_validation (scikit-learn#9480)
  [MRG+1] FIX Add missing mixins to ClassifierChain (scikit-learn#9473)
  Bring last code block in line with the image. (scikit-learn#9488)
  FIX Pass sample_weight as kwargs in VotingClassifier (scikit-learn#9493)
  FIX Incorrent implementation of noise_variance_ in PCA._fit_truncated (scikit-learn#9108)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Aug 12, 2017
* dfsg: (99 commits)
  DOC one more version issue in doc
  skip docstring tests because not useful to users and has some issues
  deprecation of n_components happened in 0.19 not 0.18 (scikit-learn#9527)
  sync whatsnew with master so I'm less confused
  DOC more navigation links
  DOC a note on data leakage and pipeline (scikit-learn#9510)
  DOC set release date to Friday
  DOC Update news and menu for 0.19 release
  DOC list of contributors to 0.19
  DOC Change release date to Thursday
  DOC Remove some whitespace from what's new
  Update what's new for recent changes
  Use base.is_classifier instead instead of isinstance (scikit-learn#9482)
  Fix safe_indexing with read-only indices (scikit-learn#9507)
  [MRG+1] add scorer based on explained_variance_score (scikit-learn#9259)
  fix wrong assert in test_validation (scikit-learn#9480)
  [MRG+1] FIX Add missing mixins to ClassifierChain (scikit-learn#9473)
  Bring last code block in line with the image. (scikit-learn#9488)
  FIX Pass sample_weight as kwargs in VotingClassifier (scikit-learn#9493)
  FIX Incorrent implementation of noise_variance_ in PCA._fit_truncated (scikit-learn#9108)
  ...
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VotingClassifier should pass sample_weight to estimators as kwargs
3 participants