Skip to content

Sample_weight isn't overwritten anymore in logistic_regression #18480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Sample_weight isn't overwritten anymore in logistic_regression #18480

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 28, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes #18347

Any other comments?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ansur!

Also please add an entry to the change log at doc/whats_new/v0.24.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

multi_class=multi_class)
clf.fit(X, y, sample_weight=W)
actual = W.sum()
assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep lines shorter than 80 char, and also to avoid exact float comparison

Suggested change
assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual)
msg = (
f'Sum of weight before ({expected}) should be the same as'
f'sum if weight after ({actual})'
)
assert_allclose(expected, actual, err_msg=msg)

Copy link
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added whats_new info on pr 18480

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Ansur !

Comment on lines +1870 to +1883
@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'})
def test_sample_weight_not_modified(multi_class):
X, y = load_iris(return_X_y=True)
np.random.seed(1234)
W = np.random.random(len(X)) * 10.0

for weight in [{0: 1.0, 1: 10.0, 2: 1.0}]:
for class_weight in (weight, 'balanced'):
expected = W.sum()

clf = LogisticRegression(random_state=0,
class_weight=class_weight,
max_iter=200,
multi_class=multi_class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take advantage of pytest.parametrize:

@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'})
@pytest.mark.parametrize("class_weight", [
    {0: 1.0, 1: 10.0, 2: 1.0}, 'balanced'
])
def test_sample_weight_not_modified(multi_class, class_weight):
    X, y = load_iris(return_X_y=True)
    n_features = len(X)
    W = np.ones(n_features)
    W[:n_features // 2] = 2

    expected = W.sum()
	...

@@ -665,7 +665,7 @@ def _logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
if isinstance(class_weight, dict) or multi_class == 'multinomial':
class_weight_ = compute_class_weight(class_weight,
classes=classes, y=y)
sample_weight *= class_weight_[le.fit_transform(y)]
sample_weight = sample_weight * class_weight_[le.fit_transform(y)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth instead of doing this, would it be better to have copy=False/True within _check_sample_weight signature?

@adrinjalali adrinjalali deleted the branch scikit-learn:master January 22, 2021 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of class_weight dictionary in LogisticRegression modifies sample_weight object
5 participants