-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
ENH add gap safe screening rules to enet_coordinate_descent #31882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Based on the build failure of some CI runners, it seems that this code requires bumping up the minimum Cython version. I think it's fine because it's a not runtime dependency. |
Any idea why this PR would lead to a change of predicted values in this case?
Maybe we have nearly tied choices for the optimal value of |
The solution is within the uncertainty of |
model.set_params(warm_start=True) | ||
model.fit(X, y) | ||
n_iter_warm_start = model.n_iter_ | ||
assert n_iter_warm_start == 1 | ||
if sparse_X: | ||
# TODO: sparse_enet_coordinate_descent is not yet updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly we should address this TODO before merging this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No for 2 reasons:
- Let‘s keep this PR small
- testing: If we only touch this one CD-function, our tests compare it to the others and garantees same results with high precision (small tol)
Also: could you update the docstrings and the user guide to reference the gap safe screening rule paper? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a cursory glance at the diff and the paper, this PR implements some basic active set based on the gap safe rule written in the Celer paper, however it does not seem to implement the precise active set construction described in Algorithm 4. On top of this, the Celer project implements other things such as the "dual extrapolation" from https://arxiv.org/abs/1907.05830 which probably amplifies the expected speed-ups. So I suspect that the performance of this PR is still far from the potential gains one could obtain from the full-fledged Celer implementation. Have you tried to run some benchmark against it to assess the remaining difference?
I don't mean to block this PR for that because it seems to already provide a net improvement over main, and I am fine with proceeding iteratively, but we should not imply in the doc or changelog that scikit-learn implements all the things described in the Celer paper (and in even less so if you consider the dual extrapolation paper on top).
Could you please add a changelog entry to describe the contents of this PR?
Once done, and once #31882 (comment) (both with cyclic and shuffling CD) and #31882 (comment) are addressed, LGTM.
First of all, @ogrisel, thanks for your review, much appreciated.
And I have nowhere claimed to have added working sets or dual extrapolation, see PR description. Again, this The neat thing is that it is minimal:
It nowhere does!
This took me a whole day, that's longer than the implementation of the screening rule itself! |
Sorry, I did not mean to imply that you made that claim in the PR, but I was just pointing that we should not leave the impression that we implement the full Celer algorithm when referencing the Celer paper in the docstring of the estimators (or the user guide). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos, but otherwise, LGTM.
Note: the lockfiles have been concurrently updated this morning hence the conflict.
doc/whats_new/upcoming_changes/sklearn.linear_model/31882.enhancement.rst
Outdated
Show resolved
Hide resolved
Task list:
|
doc/modules/linear_model.rst
Outdated
Coordinate Descent with Gap Safe Screening Rules | ||
------------------------------------------------ | ||
|
||
Coordinate descent (CD) is a strategy so solve the minimization problem that considers a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the way it is presented in sklearn, but just for completeness, CD is not usually exact minimization along coordiante
Here the smooth part of the objective function is quadratic, so this proximal step turns out to exactly minimize the function. But that's a very particular case. CD for logistic regression for example, does not exactly minimize the function along coordiante j.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the wider context.
For the historical context - and my education - didn't proximal gradient thinking come later than CD?
@mathurinm Thanks for your review. Very valuable! |
This reverts commit 53ac9dc.
* default to do_screening=True in enet_path
If CD pipelines are green this is ready for merge. Note that I removed all the private |
…hon 3.1" This reverts commit e71e89b.
Were are back at a neat PR: Without the addition to the user guide the changed lines of code are: +278 -78 |
Reference Issues/PRs
Solves #229. The 2nd oldest open issue at the time of opening this PR!!!
What does this implement/fix? Explain your changes.
This PR adds Gap Safe Screening Rules for the private function
enet_coordinate_descent
.In order to make it testable for others, I added a private attribute
_do_screening
(bool) to several classes likeLasso
.Any other comments?
Following https://arxiv.org/abs/1802.07481 but without acceleration or working sets. To keep it simple.
For reviewers: Please first merge #31906 and #31905.