-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Sample_weight isn't overwritten anymore in logistic_regression #18480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Ansur!
Also please add an entry to the change log at doc/whats_new/v0.24.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself with :user:
.
multi_class=multi_class) | ||
clf.fit(X, y, sample_weight=W) | ||
actual = W.sum() | ||
assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep lines shorter than 80 char, and also to avoid exact float comparison
assert expected == actual, 'Sum of weight before ({}) should be the same as sum if weight after ({})'.format(expected, actual) | |
msg = ( | |
f'Sum of weight before ({expected}) should be the same as' | |
f'sum if weight after ({actual})' | |
) | |
assert_allclose(expected, actual, err_msg=msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added whats_new info on pr 18480
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @Ansur !
@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'}) | ||
def test_sample_weight_not_modified(multi_class): | ||
X, y = load_iris(return_X_y=True) | ||
np.random.seed(1234) | ||
W = np.random.random(len(X)) * 10.0 | ||
|
||
for weight in [{0: 1.0, 1: 10.0, 2: 1.0}]: | ||
for class_weight in (weight, 'balanced'): | ||
expected = W.sum() | ||
|
||
clf = LogisticRegression(random_state=0, | ||
class_weight=class_weight, | ||
max_iter=200, | ||
multi_class=multi_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take advantage of pytest.parametrize
:
@pytest.mark.parametrize("multi_class", {'ovr', 'multinomial', 'auto'})
@pytest.mark.parametrize("class_weight", [
{0: 1.0, 1: 10.0, 2: 1.0}, 'balanced'
])
def test_sample_weight_not_modified(multi_class, class_weight):
X, y = load_iris(return_X_y=True)
n_features = len(X)
W = np.ones(n_features)
W[:n_features // 2] = 2
expected = W.sum()
...
@@ -665,7 +665,7 @@ def _logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True, | |||
if isinstance(class_weight, dict) or multi_class == 'multinomial': | |||
class_weight_ = compute_class_weight(class_weight, | |||
classes=classes, y=y) | |||
sample_weight *= class_weight_[le.fit_transform(y)] | |||
sample_weight = sample_weight * class_weight_[le.fit_transform(y)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth instead of doing this, would it be better to have copy=False/True
within _check_sample_weight
signature?
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Fixes #18347
Any other comments?