-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Cross-validation supports optional sample weights #7112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fix enables use of sample weights in cross validation, i.e., cross_val_score and cross_val_predict. Since most classifiers (estimators) do explicitly allow sample weights in fit(), this capability is required to adequately measure performance in those cases.
Failed tests should now be fixed. |
from sklearn.metrics.scorer import check_scoring | ||
from sklearn.utils.fixes import bincount | ||
from sklearn.gaussian_process.kernels import Kernel as GPKernel | ||
from sklearn.exceptions import FitFailedWarning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this? Imports should always be relative.
Thanks for your PR. It cannot be merged as it is. You should modify the files in the model_selection module and leave the cross_validation.py file untouched, as it is there only for legacy. You need to add tests for the new functionality. You shouldn't change things like relative imports. |
there is a |
For prediction, I think you are right, but for not for scoring. In other I'll try to address Gael's comments and resubmit the PR.
On Fri, Jul 29, 2016 at 11:33 AM, Andreas Mueller notifications@github.com
Ilya |
@ilyaeck what happened to this pull request? does fit_params handle weighted scoring? I am interested in the case that the sample weights are counts (for grouped data - to reduce memory/computation). In this case AFAIK, the crossvalidation process itself should be changed: ie to replicate uniformly sampling the ungrouped data. |
It never went through, so nothing happened I'm afraid.
…On Tue, Oct 9, 2018 at 12:59 AM seanv507 ***@***.***> wrote:
@ilyaeck <https://github.com/ilyaeck> what happened to this pull request?
does fit_params handle weighted scoring? I am interested in the case that
the sample weights are counts (for grouped data - to reduce
memory/computation). In this case AFAIK, the crossvalidation process itself
should be changed: ie to replicate uniformly sampling the ungrouped data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJklsQjo0qtwFh5vs_JAaBIQlSaC_9-ks5ujFd6gaJpZM4JX6-u>
.
--
*Ilya*
|
closely related to #4497. I think this can safely be closed as too many things changed in the meantime. This is quite a tricky issue. |
This fix enables use of sample weights in cross validation, i.e., cross_val_score and cross_val_predict. Since most classifiers (estimators) do explicitly allow sample weights in fit(), this capability is required to adequately measure performance in those cases.