Skip to content

FEA Add positive argument to Ridge using L-BFGS-B solver #20231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Jul 4, 2021

Conversation

tnakae
Copy link
Contributor

@tnakae tnakae commented Jun 8, 2021

Reference Issues/PRs

Fixes #19772 (Add positive argument to Ridge regression)

What does this implement/fix? Explain your changes.

  • Added lbfgs solver to Ridge regression.
  • Added positive argument to Ridge regression.
    • When set to True, forces the coefficients to be positive (default value is False)
    • Only lbfgs solver is supported positive=True argument.
    • When positive=True and solver="auto", lbfgs solver is automatically used.

@tnakae tnakae changed the title [MRG] Added positive argument to Ridge using trf solver [WIP] Added positive argument to Ridge using trf solver Jun 8, 2021
@tnakae tnakae changed the title [WIP] Added positive argument to Ridge using trf solver [MRG] Added positive argument to Ridge using trf solver Jun 9, 2021
@lorentzenchr
Copy link
Member

@tnakae Thanks for working on this! I'll try to review.

@ogrisel
Copy link
Member

ogrisel commented Jun 11, 2021

Could you please add a test that checks that it converge to the same solution with positive=True and positive=False when the ground truth data generative process is positive?

Something like:

>>> import numpy as np
>>> rng = np.random.RandomState(42)
>>> X = rng.randn(300, 100)
>>> true_coef = rng.uniform(0.1, 1, size=X.shape[1])
>>> y = X @ true_coef + 1 + rng.normal(X.shape[0]) / 10
>>> from sklearn.linear_model import Ridge
>>> Ridge().fit(X, y).coef_
array([0.8425452 , 0.88417954, 0.48863294, 0.43436882, 0.65071889,
       0.1547429 , 0.72064224, 0.80411708, 0.45595553, 0.25229772,
       ...
       0.29870553, 0.10603551, 0.69988191, 0.82548552, 0.33983739])

@tnakae
Copy link
Contributor Author

tnakae commented Jun 12, 2021

@ogrisel Added test based on your comment!

@ogrisel
Copy link
Member

ogrisel commented Jun 15, 2021

Thanks. I have another test idea: try to fit such a Ridge model both with positive=True and positive=False on a medium size (few hundreds samples and features) random dataset generated by make_regression and then check that the penalized objective is lower for positive=False but than any small random positive perturbation of the solution found by Ridge(positive=True) has a higher objective value than objective value of the solution without the random perturbation.

This last point is similar to the idea I suggested in #19615 to use the check_min utility to improve the ridge solver tests.

@ogrisel
Copy link
Member

ogrisel commented Jun 15, 2021

Out of curiosity could you also please benchmark the solver on a medium size dataset and compare to the speed of the solvers used for positive=False? Although they are not solving for the same objective, I would be curious about the performance profile and scalability of this solver. Maybe try with various synthetic datasets generated with make_regression with various n_samples / n_features chosen such that Ridge(positive=True) lasts between 5 and 10 seconds for instance.

It would also be interesting to see a screenshot of the output of a profiling run with https://pypi.org/project/viztracer/ just to make sure that there is no unexpected performance problem.

@tnakae tnakae force-pushed the ridge_positive_option branch from 54ccb3f to 45b0aae Compare June 24, 2021 14:53
@tnakae tnakae changed the title [WIP] Added positive argument to Ridge using L-BFGS-B solver [MRG] Added positive argument to Ridge using L-BFGS-B solver Jun 24, 2021
Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorentzenchr or @rth final pass?

thx @tnakae !

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good work! Thank you @tnakae

I had only a few comments.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing is to test _solve_lbfgs directly against another solver for positive=False.

@tnakae
Copy link
Contributor Author

tnakae commented Jul 3, 2021

@lorentzenchr

One last thing is to test _solve_lbfgs directly against another solver for positive=False.

Added test function test_lbfgs_solver_consistency.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentzenchr lorentzenchr changed the title [MRG] Added positive argument to Ridge using L-BFGS-B solver FEA Add positive argument to Ridge using L-BFGS-B solver Jul 4, 2021
@lorentzenchr lorentzenchr merged commit 18eef9a into scikit-learn:main Jul 4, 2021
@lorentzenchr
Copy link
Member

@tnakae Thanks again. This was a pleasant review experience on my side. Looking forward for the next one:smirk:

@tnakae tnakae deleted the ridge_positive_option branch July 4, 2021 23:03
sakibguy added a commit to sakibguy/scikit-learn that referenced this pull request Jul 5, 2021
FEA Add positive argument to Ridge using L-BFGS-B solver  (scikit-learn#20231)
@agramfort
Copy link
Member

great work @tnakae !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add positive argument to Ridge regression
4 participants