Skip to content

TST add balance property check for linear models #22892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

none

What does this implement/fix? Explain your changes.

This adds a test for all suitable models in sklearn.linear_model that the balance property holds when fit with fit_intercept=True: sum(predicted) = sum(observed) on the training data.

Any other comments?

Placed in linear_model.test.test_common.py.
Failing tests are marked as xfail.

@lorentzenchr
Copy link
Member Author

@rth @agramfort @TomDLT might be interested.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding I don't know. @TomDLT should know better.

@lorentzenchr
Copy link
Member Author

I'm not sure where the best place for such a test is. Other models like trees and isotonic regression (should) also fulfil this property and pass such a test.

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Failing tests:

  • For SGDRegressor, it is expected that the algorithm does not fully converge to a high-precision solution. It is due to the stochastic nature of the SGD algorithm, I would just skip the test.
  • For SAG/SAGA, it is a bug, which is due to the fact that the random-sample selection does not take into account the sample weights. To fix the bug, we should implement an importance sampling scheme (see LogisticRegression with SAGA using sample_weight does not converge #21305).

@cmarmo cmarmo added the Waiting for Second Reviewer First reviewer is done, need a second one! label Oct 28, 2022
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for adding this theoretical assertion, @lorentzenchr!

@lorentzenchr lorentzenchr removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 18, 2022
@jjerphan jjerphan merged commit aec3735 into scikit-learn:main Nov 18, 2022
@lorentzenchr lorentzenchr deleted the balance_property_linear_models branch November 18, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants