Skip to content

Cross-validation supports optional sample weights #7112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ilyaeck
Copy link

@ilyaeck ilyaeck commented Jul 29, 2016

This fix enables use of sample weights in cross validation, i.e., cross_val_score and cross_val_predict. Since most classifiers (estimators) do explicitly allow sample weights in fit(), this capability is required to adequately measure performance in those cases.

ilyaeck added 4 commits July 28, 2016 11:10
This fix enables use of sample weights in cross validation, i.e.,
cross_val_score and cross_val_predict. Since most classifiers
(estimators) do explicitly allow sample weights in fit(), this
capability is required to adequately measure performance in those cases.
@ilyaeck
Copy link
Author

ilyaeck commented Jul 29, 2016

Failed tests should now be fixed.

from sklearn.metrics.scorer import check_scoring
from sklearn.utils.fixes import bincount
from sklearn.gaussian_process.kernels import Kernel as GPKernel
from sklearn.exceptions import FitFailedWarning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this? Imports should always be relative.

@GaelVaroquaux
Copy link
Member

Thanks for your PR. It cannot be merged as it is.

You should modify the files in the model_selection module and leave the cross_validation.py file untouched, as it is there only for legacy.

You need to add tests for the new functionality.

You shouldn't change things like relative imports.

@amueller
Copy link
Member

there is a fit_params parameter that can contain sample_weights. So what you want to do can already be achieved, right?

@ilyaeck
Copy link
Author

ilyaeck commented Jul 30, 2016

For prediction, I think you are right, but for not for scoring. In other
words, in 'cross_val_score', you can inject sample weights into estimator
fitting, but you cannot do weighted scoring. Does that make sense?

I'll try to address Gael's comments and resubmit the PR.

  • Ilya

On Fri, Jul 29, 2016 at 11:33 AM, Andreas Mueller notifications@github.com
wrote:

there is a fit_params parameter that can contain sample_weights. So what
you want to do can already be achieved, right?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7112 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJklgRJlE_tBfqAb9TicrU8BCwCxtRMks5qakd4gaJpZM4JX6-u
.

Ilya

@seanv507
Copy link

seanv507 commented Oct 9, 2018

@ilyaeck what happened to this pull request? does fit_params handle weighted scoring? I am interested in the case that the sample weights are counts (for grouped data - to reduce memory/computation). In this case AFAIK, the crossvalidation process itself should be changed: ie to replicate uniformly sampling the ungrouped data.

@ilyaeck
Copy link
Author

ilyaeck commented Oct 9, 2018 via email

@amueller amueller added the Needs Decision Requires decision label Aug 5, 2019
@amueller
Copy link
Member

amueller commented Aug 5, 2019

closely related to #4497. I think this can safely be closed as too many things changed in the meantime. This is quite a tricky issue.

@amueller amueller closed this Aug 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants