-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] FIX sample_weight invariance for linear models #19616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FIX sample_weight invariance for linear models #19616
Conversation
I pushed more tests to highlight all the remaining problems. In particular I added I would need more time to write the weighted objective functions for all models ( But feel free to give it a try without waiting for me. |
@ogrisel As opposed to benchmarking, there is no dark margic involved in this: This objective function of Enet ist invariant to rescaling of sample weights. We use this freedom: By a lucky hunch we end up with |
@lorentzenchr do you have a bit of bandwidth to take over here? |
I'll try. |
This whole |
@ogrisel @agramfort What do you think? Does this solve the issue? Or is 599a63f a show stopper for you? I think it's ok and we want to get rid of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @lorentzenchr for cleaning up this mess 🙏
@ogrisel Are you fine with the current status? If so let's count each of our reviews as +1/2 which adds up to +1 => merge:smirk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (may count as a half review:smirk:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ogrisel @lorentzenchr !
Sorry for the lack of reply. I guess I unconsciously ignored the notifications related to this cursed PR. We probably still have bugs with |
This is a tentative fix for #17444 but doing so breaks a test added as part of #19426.
The new normalize in base model uses
sample_weight.sum()
instead ofX.shape[0]
to scaleX_var
. This effectively fixes the problem of #17444 as can be seen in the new test:sklearn.linear_model/tests/test_ridge.py::test_sample_weight_invariance
.However that requires updating the
_scale_alpha_inplace
to maketest_linear_model_sample_weights_normalize_in_pipeline
pass for ridge.However now we get this test that fails for Lasso / ElasticNet. I suspect this is because of this dark magic:
scikit-learn/sklearn/linear_model/_coordinate_descent.py
Lines 795 to 807 in 1045d16
but I am not sure what is the correct fix.
/cc @maikia @agramfort @rth.