-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST Add tests for LinearRegression that sample weights act consistently #15554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One was actually proposed in #15015 but is waiting for all estimators to pass. Not sure it's the best strategy, maybe marking it as XFAIL would indeed be better. |
This PR adds 4 tests to
I could adapt this PR to only do 3 and 4 for |
Thanks @lorentzenchr ! I have merged #16507 which should make some of this easier. |
Sorry, @lorentzenchr I reverted #16507 in the end, there were incompatible changes to check_estimator meanwhile. New PR is in #17176.
No but it would definitely makes sense to add them there as well (for estimators that support it). |
That PR is finally merged. There is also #17441 which would allow running common tests with non default parameters (i.e. other values of fit_intercept, solver, etc). For instance,
Edit: opened #17444 about |
Mmh, those tests reveal some shortcomings in |
#19616 added some tests, I'll rebase and clean up. |
e3b9ef8
to
2fc0657
Compare
Hi @lorentzenchr, what is the state of this PR? Can one pursue it? 🙂 |
It took me longer than anticipated. The tests show some shortcomings, uncovered in dd4f742. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @lorentzenchr. To you, what is the best approach to resolve the current assertion failures?
# TODO: This often fails, e.g. when calling | ||
# SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest \ | ||
# sklearn/linear_model/tests/test_base.py\ | ||
# ::test_linear_regression_sample_weight_consistency | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, can you mark this test as xfail
in this case and explain which assertion is not verified? Is this sensitive to the scaling of y[-5:]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails at random, so xfail is not appropriate IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #26164 opened, this is clean and ready from my side. |
@jjerphan Friendly ping. CI 🟢 This is "only" a test improvement, no functionality is changed, so a review with less scrutiny is acceptable, I guess. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for your patience, @lorentzenchr.
Ideally, we could combine those tests into a single parametrized one, for now, the corner cases and the GLM implementations make it hardly doable.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
I thought about that. But as you say, there are too many different corner cases. Can we merge as is with only one review as only tests are added? Or maybe @rth wants to bring it over the line and press one or two buttons🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have this more extensively tested. I'd like to have such kind of test for more estimators because the common test just check for 1 combination and we often discover bugs years later for the other combinations. Thanks @lorentzenchr, LGTM !
…ly (scikit-learn#15554) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Relates to #21504.
Make bugs in #15438 visible by introducing tests (with xfail).
What does this implement/fix? Explain your changes.
So far, this PR only adds tests for
Ridge
andLinearRegression
.Branch is based on #15530.
Any other comments?
Would be nice to have the generalizable part of those tests in
sklearn/utils/estimator_checks.py
:sample_weight = 0
equivalent of removing that sample (data row). Edit: Ongoing effort in add common test that zero sample weight means samples are ignored #15015