Sample_weight isn't overwritten anymore in logistic_regression #18480

ghost · 2020-09-28T14:28:47Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

rth

Also please add an entry to the change log at doc/whats_new/v0.24.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

rth · 2020-09-28T14:32:51Z

sklearn/linear_model/tests/test_logistic.py

+                                    multi_class=multi_class)
+            clf.fit(X, y, sample_weight=W)
+            actual = W.sum()
+            assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)


We need to keep lines shorter than 80 char, and also to avoid exact float comparison

Suggested change

assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)

msg = (

f'Sum of weight before ({expected}) should be the same as'

f'sum if weight after ({actual})'

)

assert_allclose(expected, actual, err_msg=msg)

ghost

Added whats_new info on pr 18480

thomasjpfan

Thank you for the PR @Ansur !

thomasjpfan · 2020-09-29T18:21:53Z

sklearn/linear_model/tests/test_logistic.py

+@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'})
+def test_sample_weight_not_modified(multi_class):
+    X, y = load_iris(return_X_y=True)
+    np.random.seed(1234)
+    W = np.random.random(len(X)) * 10.0
+
+    for weight in [{0: 1.0, 1: 10.0, 2: 1.0}]:
+        for class_weight in (weight, 'balanced'):
+            expected = W.sum()
+
+            clf = LogisticRegression(random_state=0,
+                                    class_weight=class_weight,
+                                    max_iter=200,
+                                    multi_class=multi_class)


We can take advantage of pytest.parametrize:

@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'}) @pytest.mark.parametrize("class_weight", [ {0: 1.0, 1: 10.0, 2: 1.0}, 'balanced' ]) def test_sample_weight_not_modified(multi_class, class_weight): X, y = load_iris(return_X_y=True) n_features = len(X) W = np.ones(n_features) W[:n_features // 2] = 2 expected = W.sum() ...

glemaitre · 2020-10-07T18:19:31Z

sklearn/linear_model/_logistic.py

@@ -665,7 +665,7 @@ def _logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
    if isinstance(class_weight, dict) or multi_class == 'multinomial':
        class_weight_ = compute_class_weight(class_weight,
                                             classes=classes, y=y)
-        sample_weight *= class_weight_[le.fit_transform(y)]
+        sample_weight = sample_weight * class_weight_[le.fit_transform(y)]


@rth instead of doing this, would it be better to have copy=False/True within _check_sample_weight signature?

issue 18347 - sample_weight isn't overwritten anymore

ea7ed6e

github-actions bot added the module:linear_model label Sep 28, 2020

rth reviewed Sep 28, 2020

View reviewed changes

added whats_new info on pr 6624

2c14aa1

ghost commented Sep 28, 2020

View reviewed changes

thomasjpfan reviewed Sep 29, 2020

View reviewed changes

glemaitre reviewed Oct 7, 2020

View reviewed changes

m7142yosuke mentioned this pull request Jan 16, 2021

Fix sample_weight overwriting bug #19182

Merged

adrinjalali closed this Jan 22, 2021

adrinjalali deleted the branch scikit-learn:master January 22, 2021 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sample_weight isn't overwritten anymore in logistic_regression #18480

Sample_weight isn't overwritten anymore in logistic_regression #18480

Uh oh!

ghost commented Sep 28, 2020

Uh oh!

rth left a comment

Uh oh!

rth Sep 28, 2020

Uh oh!

ghost left a comment

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Sep 29, 2020

Uh oh!

glemaitre Oct 7, 2020

Uh oh!

Uh oh!

-            assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)
+            msg = (
+                f'Sum of weight before ({expected}) should be the same as'
+                f'sum if weight after ({actual})'
+            )
+            assert_allclose(expected, actual, err_msg=msg)

Uh oh!

Sample_weight isn't overwritten anymore in logistic_regression #18480

Sample_weight isn't overwritten anymore in logistic_regression #18480

Uh oh!

Conversation

ghost commented Sep 28, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

rth Sep 28, 2020

Choose a reason for hiding this comment

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Sep 29, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre Oct 7, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!