-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST add balance property check for linear models #22892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST add balance property check for linear models #22892
Conversation
@rth @agramfort @TomDLT might be interested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding I don't know. @TomDLT should know better.
I'm not sure where the best place for such a test is. Other models like trees and isotonic regression (should) also fulfil this property and pass such a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Failing tests:
- For SGDRegressor, it is expected that the algorithm does not fully converge to a high-precision solution. It is due to the stochastic nature of the SGD algorithm, I would just skip the test.
- For SAG/SAGA, it is a bug, which is due to the fact that the random-sample selection does not take into account the sample weights. To fix the bug, we should implement an importance sampling scheme (see LogisticRegression with SAGA using sample_weight does not converge #21305).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for adding this theoretical assertion, @lorentzenchr!
Reference Issues/PRs
none
What does this implement/fix? Explain your changes.
This adds a test for all suitable models in
sklearn.linear_model
that the balance property holds when fit withfit_intercept=True
:sum(predicted) = sum(observed)
on the training data.Any other comments?
Placed in
linear_model.test.test_common.py
.Failing tests are marked as
xfail
.