-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Fix sample_weight overwriting bug #19182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sample_weight overwriting bug #19182
Conversation
@@ -1273,7 +1273,7 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False): | |||
return lambdas | |||
|
|||
|
|||
def _check_sample_weight(sample_weight, X, dtype=None): | |||
def _check_sample_weight(sample_weight, X, dtype=None, copy=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be safer to always copy and not have this param ? It's not the first time we see this issue.
What do you think @glemaitre @rth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it would be best if we can avoid copies. However, it might be safer to have copy=True
by default (not sure what it would involved thought).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also +1 for copy=False by default. The function is "check sample weight", nothing says it should make copies. So if someones makes inplace modifications of sample_weight
IMO it should be up to them to check that a copy was made previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that setting True by default requires that we should check every occurence of check_sample_weight to be sure it's not called in a tight loop where making a copy would be bad.
Maybe it's safer for keeping this PR simple to set False as default value. Sorry @m7142yosuke for the back and forth :/
This reverts commit ea65b81.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @m7142yosuke
@jeremiedbb |
Yes please add an entry in the v1.0 change log |
@m7142yosuke you have merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @m7142yosuke! LGTM as well.
@@ -1273,7 +1273,7 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False): | |||
return lambdas | |||
|
|||
|
|||
def _check_sample_weight(sample_weight, X, dtype=None): | |||
def _check_sample_weight(sample_weight, X, dtype=None, copy=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also +1 for copy=False by default. The function is "check sample weight", nothing says it should make copies. So if someones makes inplace modifications of sample_weight
IMO it should be up to them to check that a copy was made previously.
We should probably backport in 0.24.2 instead of 1.0 |
Reference Issues/PRs
Fixes #18347
What does this implement/fix? Explain your changes.
Add
_check_sample_weight
tocopy=False/True
Any other comments?
#18480 is in progress, but I created this PR because the worker has been deleted