Skip to content

Fix sample weight handling in scoring _log_reg_scoring_path #29419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

snath-xoc
Copy link
Contributor

@snath-xoc snath-xoc commented Jul 5, 2024

Reference Issues/PRs

Fixes #29416

What does this implement/fix? Explain your changes.

Added sample weighting for test set into default calculation of scores within _log_reg_scoring_path

TO DO:

  • so far works with max_iter 10_000 and tol 1e-8, expected to work with tol 1e-12 but this fails on lbfgs solver
  • Switch changelog to v1.6.rst

Copy link

github-actions bot commented Jul 5, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: cb2f28b. Link to the linter CI: here

@ogrisel
Copy link
Member

ogrisel commented Jul 5, 2024

Thanks for the PR. Could you please add a non regression test based on the reproducer provided in #29416 and make it run on all solvers with a small value of tol such as tol=1e-12? It's possible that strict equivalence might be hard to reach for stochastic solvers such as SAG/SAGA but I am not sure.

@snath-xoc
Copy link
Contributor Author

Note: in the test_logistic_regression_sample_weight sag and saga are left out as they systematically fail

@ogrisel
Copy link
Member

ogrisel commented Aug 7, 2024

@snath-xoc could you please push a commit that triggers running the updated test in this PR for all admissible values of global_random_seed? Instructions are available in this information issue: #28959.

EDIT: before testing on the CI, you should test locally with commands like the following:

SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -k test_logistic_regression_sample_weights sklearn/linear_model/tests/test_logistic.py -vlx

at the moment, the test fails for many seeds but this can be fixed as explained in the review comments below.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some feedback to help move this PR forward. Please don't forget to add an entry to the changelog.

@ogrisel
Copy link
Member

ogrisel commented Aug 7, 2024

I find this test too long to follow. You might want to decouple the part that tests equivalence between integer sample weights and their repeated counter part from the part of the test that checks equivalence between class weights and sample weights into two independent tests.

@ogrisel
Copy link
Member

ogrisel commented Aug 9, 2024

To actually trigger [all random seeds] tests you need to include the list of test function names after the [all random seeds] flag in the commit message as explained in #28959.

snath-xoc and others added 10 commits August 21, 2024 15:53
test_logistic_regression_sample_weights
test_logistic_regression_sample_weights
test_logistic_regression_sample_weights
test_logistic_regression_sample_weights
test_logistic_regression_sample_weights
test_sample_and_class_weight_equivalence_liblinear
test_logistic_regression_sample_weights
test_logistic_regression_solver_class_weights
test_sample_and_class_weight_equivalence_liblinear
test_logistic_regression_sample_weights
test_logistic_regression_solver_class_weights
test_sample_and_class_weight_equivalence_liblinear
test_logistic_regression_sample_weights
test_logistic_regression_solver_class_weights
test_sample_and_class_weight_equivalence_liblinear
test_logistic_regression_sample_weights
test_logistic_regression_solver_class_weights
@snath-xoc snath-xoc marked this pull request as ready for review August 26, 2024 09:16
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. Thanks @snath-xoc

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor feedback on top of Omar's but otherwise, LGTM!

@ogrisel
Copy link
Member

ogrisel commented Sep 5, 2024

Note for later: the LogisticRegressionCV class has the following tag to skip the common test:

    def _more_tags(self):
        return {
            "_xfail_checks": {
                "check_sample_weights_invariance": (
                    "zero sample_weight is not equivalent to removing samples"
                ),
            }
        }

However check_sample_weights_invariance currently does not work well with kind="zeros" in the presence of a cv constructor parameter. This common test should be updated to drop samples without moving the CV boundaries for the remaining training points.

This should better be done in a dedicated PR though.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Omar Salman <omar.salman2007@gmail.com>
@ogrisel ogrisel enabled auto-merge (squash) September 6, 2024 08:49
@ogrisel ogrisel merged commit 7baa11e into scikit-learn:main Sep 6, 2024
28 checks passed
@ogrisel ogrisel changed the title add sample weight to default scoring _log_reg_scoring_path Fix sample weight handling in scoring _log_reg_scoring_path Sep 6, 2024
@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2024

Merged! Thanks very much @snath-xoc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LogisticRegressionCV does not handle sample weights as expected when using liblinear solver
3 participants