Skip to content

TST Add tests for LinearRegression that sample weights act consistently #15554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 26, 2023

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Nov 6, 2019

Reference Issues/PRs

Relates to #21504.

Make bugs in #15438 visible by introducing tests (with xfail).

What does this implement/fix? Explain your changes.

So far, this PR only adds tests for Ridge and LinearRegression.
Branch is based on #15530.

Any other comments?

Would be nice to have the generalizable part of those tests in sklearn/utils/estimator_checks.py:

@rth
Copy link
Member

rth commented Nov 7, 2019

Would be nice to have the generalizable part of those tests in sklearn/utils/estimator_checks.py:
sample_weight = 0 equivalent of removing that sample (data row).

One was actually proposed in #15015 but is waiting for all estimators to pass. Not sure it's the best strategy, maybe marking it as XFAIL would indeed be better.

@lorentzenchr
Copy link
Member Author

This PR adds 4 tests to Ridge and LinearRegression:

  1. sample_weight=np.ones(...) gives same fit as sample_weight=None. This is better solved in Common check for sample weight invariance with removed samples #16507 for all estimators, once it is merged.
  2. Setting elements of sample_weight to 0 is equivalent to removing the corresponding sample. This is better solved in Common check for sample weight invariance with removed samples #16507 for all estimators, once it is merged.
  3. Scaling of sample_weight should have no effect.
    This is related to Refactor tests for sample weights #11316 RFC Semantic of sample_weight in regression metrics #15651 RFC Sample weight invariance properties #15657.
  4. Multiplying sample_weight by 2 is equivalent to repeating correspoding samples twice.

I could adapt this PR to only do 3 and 4 for Ridge and LinearRegression.

@rth
Copy link
Member

rth commented May 10, 2020

Thanks @lorentzenchr ! I have merged #16507 which should make some of this easier.

@lorentzenchr
Copy link
Member Author

@rth Do you know if the tests of #16507 also test sample_weight properties for sparse X? If so, I can get rid of point 1 and 2 (see comment above) at least for this PR.

@rth
Copy link
Member

rth commented May 10, 2020

Sorry, @lorentzenchr I reverted #16507 in the end, there were incompatible changes to check_estimator meanwhile. New PR is in #17176.

Do you know if the tests of #16507 also test sample_weight properties for sparse X?

No but it would definitely makes sense to add them there as well (for estimators that support it).

@rth
Copy link
Member

rth commented Jun 4, 2020

New PR is in #17176.

That PR is finally merged.

There is also #17441 which would allow running common tests with non default parameters (i.e. other values of fit_intercept, solver, etc). For instance,

$ pytest sklearn/tests/test_common_non_default.py -k "check_sample_weights_invariance and Ridge and not (RidgeClassifier or RidgeCV or BayesianRidge)" -v
========================================================= test session starts =========================================================
platform linux -- Python 3.8.0, pytest-5.2.1, py-1.8.0, pluggy-0.13.0 -- /home/rth/miniconda3/envs/sklearn-dev/bin/python
cachedir: .pytest_cache
rootdir: /home/rth/src/scikit-learn, inifile: setup.cfg
plugins: forked-1.1.3, xdist-1.31.0
collected 58603 items / 58547 deselected / 56 selected                                                                                

test_common_non_default[Ridge(fit_intercept=False)-check_sample_weights_invariance(kind=ones)] PASSED [  1%]
test_common_non_default[Ridge(fit_intercept=False)-check_sample_weights_invariance(kind=zeros)] PASSED [  3%]
test_common_non_default[Ridge(fit_intercept=False,solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [  5%]
test_common_non_default[Ridge(fit_intercept=False,solver='cholesky')-check_sample_weights_invariance(kind=zeros)] PASSED [  7%]
test_common_non_default[Ridge(fit_intercept=False,solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [  8%]
test_common_non_default[Ridge(fit_intercept=False,solver='lsqr')-check_sample_weights_invariance(kind=zeros)] PASSED [ 10%]
test_common_non_default[Ridge(fit_intercept=False,solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 12%]
test_common_non_default[Ridge(fit_intercept=False,solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 14%]
test_common_non_default[Ridge(fit_intercept=False,solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 16%]
test_common_non_default[Ridge(fit_intercept=False,solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 17%]
test_common_non_default[Ridge(fit_intercept=False,solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 19%]
test_common_non_default[Ridge(fit_intercept=False,solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] PASSED [ 21%]
test_common_non_default[Ridge(fit_intercept=False,solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 23%]
test_common_non_default[Ridge(fit_intercept=False,solver='svd')-check_sample_weights_invariance(kind=zeros)] PASSED [ 25%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True)-check_sample_weights_invariance(kind=ones)] PASSED [ 26%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True)-check_sample_weights_invariance(kind=zeros)] PASSED [ 28%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [ 30%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=zeros)] PASSED [ 32%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [ 33%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=zeros)] PASSED [ 35%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 37%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 39%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 41%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 42%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 44%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] PASSED [ 46%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 48%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='svd')-check_sample_weights_invariance(kind=zeros)] PASSED [ 50%]
test_common_non_default[Ridge()-check_sample_weights_invariance(kind=ones)] PASSED    [ 51%]
test_common_non_default[Ridge()-check_sample_weights_invariance(kind=zeros)] PASSED   [ 53%]
test_common_non_default[Ridge(solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [ 55%]
test_common_non_default[Ridge(solver='cholesky')-check_sample_weights_invariance(kind=zeros)] PASSED [ 57%]
test_common_non_default[Ridge(solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [ 58%]
test_common_non_default[Ridge(solver='lsqr')-check_sample_weights_invariance(kind=zeros)] PASSED [ 60%]
test_common_non_default[Ridge(solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 62%]
test_common_non_default[Ridge(solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 64%]
test_common_non_default[Ridge(solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 66%]
test_common_non_default[Ridge(solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 67%]
test_common_non_default[Ridge(solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 69%]
test_common_non_default[Ridge(solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] PASSED [ 71%]
test_common_non_default[Ridge(solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 73%]
test_common_non_default[Ridge(solver='svd')-check_sample_weights_invariance(kind=zeros)] PASSED [ 75%]
test_common_non_default[Ridge(normalize=True)-check_sample_weights_invariance(kind=ones)] PASSED [ 76%]
test_common_non_default[Ridge(normalize=True)-check_sample_weights_invariance(kind=zeros)] FAILED [ 78%]
test_common_non_default[Ridge(normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [ 80%]
test_common_non_default[Ridge(normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=zeros)] FAILED [ 82%]
test_common_non_default[Ridge(normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [ 83%]
test_common_non_default[Ridge(normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=zeros)] FAILED [ 85%]
test_common_non_default[Ridge(normalize=True,solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 87%]
test_common_non_default[Ridge(normalize=True,solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 89%]
test_common_non_default[Ridge(normalize=True,solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 91%]
test_common_non_default[Ridge(normalize=True,solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 92%]
test_common_non_default[Ridge(normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 94%]
test_common_non_default[Ridge(normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] FAILED [ 96%]
test_common_non_default[Ridge(normalize=True,solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 98%]
test_common_non_default[Ridge(normalize=True,solver='svd')-check_sample_weights_invariance(kind=zeros)] FAILED [100%]
===== 13 failed, 43 passed, 58547 deselected in 31.16s ===========

Edit: opened #17444 about Ridge(normalize=True)

@cmarmo cmarmo added the module:test-suite everything related to our tests label Sep 15, 2020
Base automatically changed from master to main January 22, 2021 10:51
@lorentzenchr lorentzenchr changed the title [WIP] Add tests that sample weights act consistently [MRG] Add tests that sample weights act consistently Jun 27, 2021
@lorentzenchr
Copy link
Member Author

lorentzenchr commented Jun 27, 2021

Mmh, those tests reveal some shortcomings in Ridge for sparse input and for LinearRegression also for dense input.

@lorentzenchr
Copy link
Member Author

#19616 added some tests, I'll rebase and clean up.

@jjerphan
Copy link
Member

jjerphan commented Mar 6, 2023

Hi @lorentzenchr, what is the state of this PR? Can one pursue it? 🙂

@lorentzenchr lorentzenchr changed the title [MRG] Add tests that sample weights act consistently TST Add tests for LinearRegression that sample weights act consistently Mar 26, 2023
@lorentzenchr
Copy link
Member Author

It took me longer than anticipated. The tests show some shortcomings, uncovered in dd4f742.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @lorentzenchr. To you, what is the best approach to resolve the current assertion failures?

Comment on lines 794 to 798
# TODO: This often fails, e.g. when calling
# SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest \
# sklearn/linear_model/tests/test_base.py\
# ::test_linear_regression_sample_weight_consistency
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, can you mark this test as xfail in this case and explain which assertion is not verified? Is this sensitive to the scaling of y[-5:]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails at random, so xfail is not appropriate IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the comment in 80513c9 and linked to #26164.

@lorentzenchr
Copy link
Member Author

With #26164 opened, this is clean and ready from my side.

@lorentzenchr
Copy link
Member Author

@jjerphan Friendly ping. CI 🟢 This is "only" a test improvement, no functionality is changed, so a review with less scrutiny is acceptable, I guess.

@lorentzenchr lorentzenchr added this to the 1.3 milestone Apr 30, 2023
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for your patience, @lorentzenchr.

Ideally, we could combine those tests into a single parametrized one, for now, the corner cases and the GLM implementations make it hardly doable.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@lorentzenchr
Copy link
Member Author

Ideally, we could combine those tests into a single parametrized one, for now, the corner cases and the GLM implementations make it hardly doable.

I thought about that. But as you say, there are too many different corner cases.

Can we merge as is with only one review as only tests are added? Or maybe @rth wants to bring it over the line and press one or two buttons🤞

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have this more extensively tested. I'd like to have such kind of test for more estimators because the common test just check for 1 combination and we often discover bugs years later for the other combinations. Thanks @lorentzenchr, LGTM !

@jeremiedbb jeremiedbb enabled auto-merge (squash) May 26, 2023 14:58
@jeremiedbb jeremiedbb merged commit 42d2359 into scikit-learn:main May 26, 2023
@lorentzenchr lorentzenchr deleted the sw_tests branch June 2, 2023 07:37
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants