Skip to content

Fix sample_weight overwriting bug #19182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 1, 2021
Merged

Fix sample_weight overwriting bug #19182

merged 10 commits into from
Feb 1, 2021

Conversation

m7142yosuke
Copy link
Contributor

Reference Issues/PRs

Fixes #18347

What does this implement/fix? Explain your changes.

Add _check_sample_weight to copy=False/True

Any other comments?

#18480 is in progress, but I created this PR because the worker has been deleted

@@ -1273,7 +1273,7 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False):
return lambdas


def _check_sample_weight(sample_weight, X, dtype=None):
def _check_sample_weight(sample_weight, X, dtype=None, copy=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be safer to always copy and not have this param ? It's not the first time we see this issue.

What do you think @glemaitre @rth ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it would be best if we can avoid copies. However, it might be safer to have copy=True by default (not sure what it would involved thought).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also +1 for copy=False by default. The function is "check sample weight", nothing says it should make copies. So if someones makes inplace modifications of sample_weight IMO it should be up to them to check that a copy was made previously.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that setting True by default requires that we should check every occurence of check_sample_weight to be sure it's not called in a tight loop where making a copy would be bad.
Maybe it's safer for keeping this PR simple to set False as default value. Sorry @m7142yosuke for the back and forth :/

Base automatically changed from master to main January 22, 2021 10:53
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @m7142yosuke

@m7142yosuke
Copy link
Contributor Author

@jeremiedbb
Do you need to add the change log?

@jeremiedbb
Copy link
Member

Do you need to add the change log?

Yes please add an entry in the v1.0 change log

@jeremiedbb
Copy link
Member

jeremiedbb commented Jan 26, 2021

@m7142yosuke you have merge conflicts.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @m7142yosuke! LGTM as well.

@@ -1273,7 +1273,7 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False):
return lambdas


def _check_sample_weight(sample_weight, X, dtype=None):
def _check_sample_weight(sample_weight, X, dtype=None, copy=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also +1 for copy=False by default. The function is "check sample weight", nothing says it should make copies. So if someones makes inplace modifications of sample_weight IMO it should be up to them to check that a copy was made previously.

@rth rth merged commit 81a56bd into scikit-learn:main Feb 1, 2021
@glemaitre glemaitre removed their request for review February 11, 2021 12:15
@glemaitre
Copy link
Member

We should probably backport in 0.24.2 instead of 1.0

@glemaitre glemaitre added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Feb 11, 2021
@glemaitre glemaitre added this to the 0.24.2 milestone Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:linear_model module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of class_weight dictionary in LogisticRegression modifies sample_weight object
4 participants