ENH add gap safe screening rules to enet_coordinate_descent #31882

lorentzenchr · 2025-08-05T16:51:30Z

Reference Issues/PRs

Solves #229. The 2nd oldest open issue at the time of opening this PR!!!

What does this implement/fix? Explain your changes.

This PR adds Gap Safe Screening Rules for the private function enet_coordinate_descent.

In order to make it testable for others, I added a private attribute _do_screening (bool) to several classes like Lasso.

Any other comments?

Following https://arxiv.org/abs/1802.07481 but without acceleration or working sets. To keep it simple.

For reviewers: Please first merge #31906 and #31905.

github-actions · 2025-08-05T16:52:26Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 537b41a. Link to the linter CI: here}

lorentzenchr · 2025-08-05T18:45:38Z

A little benchmark on the leukemia dataset

1. Full Lasso Path with 100 steps

n_samples = 72, n_features = 7129

Starting path computation...
Lasso without screening - time: 0.18 s
Lasso with screening    - time: 0.10 s
Lasso without screening - time: 1.66 s
Lasso with screening    - time: 0.53 s
Lasso without screening - time: 9.75 s
Lasso with screening    - time: 1.62 s

Code

# https://github.com/mathurinm/celer/blob/main/examples/plot_leukemia_path.py
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import lasso_path
from sklearn.datasets import fetch_openml

print(__doc__)

print("Loading data...")
dataset = fetch_openml("leukemia")
X = np.asfortranarray(dataset.data.astype(float))
y = 2 * ((dataset.target != "AML") - 0.5)
n_samples = len(y)

y -= np.mean(y)
y /= np.std(y)

print("Starting path computation...")
alpha_max = np.max(np.abs(X.T.dot(y))) / n_samples

n_alphas = 100
alphas = alpha_max * np.geomspace(1, 0.01, n_alphas)

tols = [1e-2, 1e-3, 1e-4]
results = np.zeros([2, len(tols)])
for tol_ix, tol in enumerate(tols):
    t0 = time.time()
    _, coefs, gaps = lasso_path(
        X, y, alphas=alphas, tol=tol, max_iter=10_000, do_screening=False)
    results[0, tol_ix] = time.time() - t0
    print('Lasso without screening - time: %.2f s' % results[0, tol_ix])

    t0 = time.time()
    _, coefs, dual_gaps = lasso_path(
        X, y, alphas=alphas, tol=tol, max_iter=10_000, do_screening=True)
    results[1, tol_ix] = time.time() - t0
    print('Lasso with screening    - time: %.2f s' % results[1, tol_ix])

df = pd.DataFrame(results.T, columns=["no screening", "with screening"])
df.index = [str(tol) for tol in tols]
df.plot.bar(rot=0)
plt.xlabel("stopping tolerance")
plt.ylabel("path computation time (s)")
plt.tight_layout()
plt.show()

2. Single alpha

Code

# For single values of alpha
import time
import pandas as pd
import numpy as np
import seaborn as sns

from sklearn.linear_model import lasso_path
from sklearn.datasets import fetch_openml


print("Loading data...")
dataset = fetch_openml("leukemia")
X = np.asfortranarray(dataset.data.astype(float))
y = 2 * ((dataset.target != "AML") - 0.5)
n_samples = len(y)

y -= np.mean(y)
y /= np.std(y)

print("Starting path computation...")
alpha_max = np.max(np.abs(X.T.dot(y))) / n_samples

n_alphas = 1

tols = [1e-2, 1e-3, 1e-4]
results = []  # scale_alpha_max, tol, do_screening, time
penalty_scales = [1e-1, 1e-2, 1e-5]

for alpha_idx, scale_alpha_max in enumerate(penalty_scales):
    alpha = [alpha_max * scale_alpha_max]
    print(f"alpha = alpha_max * {scale_alpha_max}")
    for tol_ix, tol in enumerate(tols):
        t0 = time.time()
        _, coefs, gaps = lasso_path(
            X, y, alphas=alpha, tol=tol, max_iter=10_000, do_screening=False)
        t = time.time() - t0
        results += [[scale_alpha_max, tol, False, t]] 
        print('  Lasso without screening - time: %.2f s' % t)

        t0 = time.time()
        _, coefs, dual_gaps = lasso_path(
            X, y, alphas=alpha, tol=tol, max_iter=10_000, do_screening=True)
        t = time.time() - t0
        results += [[scale_alpha_max, tol, True, t]] 
        print('  Lasso with screening    - time: %.2f s' % t)

df = pd.DataFrame(
    results,
    columns=["scale_alpha_max", "tol", "do_screening", "time [s]"],
)
df

sns.catplot(
    df, x="tol", y="time [s]", hue="do_screening", col="scale_alpha_max", kind="bar", errorbar=None, sharey=False)

ogrisel · 2025-08-06T09:14:28Z

cc @mathurinm @agramfort @josephsalmon

ogrisel · 2025-08-06T09:19:01Z

Based on the build failure of some CI runners, it seems that this code requires bumping up the minimum Cython version. I think it's fine because it's a not runtime dependency.

ogrisel · 2025-08-06T09:20:37Z

Any idea why this PR would lead to a change of predicted values in this case?

2117 Examples
2118 --------
2119 >>> from sklearn.linear_model import LassoCV
2120 >>> from sklearn.datasets import make_regression
2121 >>> X, y = make_regression(noise=4, random_state=0)
2122 >>> reg = LassoCV(cv=5, random_state=0).fit(X, y)
2123 >>> reg.score(X, y)
2124 0.9993
2125 >>> reg.predict(X[:1,])
Expected:
    array([-78.4951])
Got:
    array([-79.4755331])

Maybe we have nearly tied choices for the optimal value of alpha on this data?

lorentzenchr · 2025-08-06T12:50:03Z

Any idea why this PR would lead to a change of predicted values in this case?

The solution is within the uncertainty of tol. One of the main changes is that this PR adds an additional convergence check before the main loop: if the provided (warm started) coefficients already satisfy the stopping criterion, then stop. In main, it always calculates one complete cycle over coordinates/features.

sklearn/linear_model/_cd_fast.pyx

sklearn/linear_model/tests/test_coordinate_descent.py

sklearn/linear_model/tests/test_sparse_coordinate_descent.py

ogrisel · 2025-08-07T08:15:49Z

sklearn/linear_model/tests/test_coordinate_descent.py

    model.set_params(warm_start=True)
    model.fit(X, y)
    n_iter_warm_start = model.n_iter_
-    assert n_iter_warm_start == 1
+    if sparse_X:
+        # TODO: sparse_enet_coordinate_descent is not yet updated.


Similarly we should address this TODO before merging this PR.

No for 2 reasons:

Let‘s keep this PR small

testing: If we only touch this one CD-function, our tests compare it to the others and garantees same results with high precision (small tol)

ogrisel · 2025-08-07T08:58:11Z

Also: could you update the docstrings and the user guide to reference the gap safe screening rule paper?

- fixes for fused ctuples

ogrisel

From a cursory glance at the diff and the paper, this PR implements some basic active set based on the gap safe rule written in the Celer paper, however it does not seem to implement the precise active set construction described in Algorithm 4. On top of this, the Celer project implements other things such as the "dual extrapolation" from https://arxiv.org/abs/1907.05830 which probably amplifies the expected speed-ups. So I suspect that the performance of this PR is still far from the potential gains one could obtain from the full-fledged Celer implementation. Have you tried to run some benchmark against it to assess the remaining difference?

I don't mean to block this PR for that because it seems to already provide a net improvement over main, and I am fine with proceeding iteratively, but we should not imply in the doc or changelog that scikit-learn implements all the things described in the Celer paper (and in even less so if you consider the dual extrapolation paper on top).

Could you please add a changelog entry to describe the contents of this PR?

Once done, and once #31882 (comment) (both with cyclic and shuffling CD) and #31882 (comment) are addressed, LGTM.

sklearn/linear_model/tests/test_coordinate_descent.py

sklearn/linear_model/_cd_fast.pyx

lorentzenchr · 2025-08-08T21:40:35Z

First of all, @ogrisel, thanks for your review, much appreciated.

From a cursory glance at the diff and the paper, this PR implements some basic active set based on the gap safe rule written in the Celer paper, however it does not seem to implement the precise active set construction described in Algorithm 4. On top of this, the Celer project implements other things such as the "dual extrapolation" from https://arxiv.org/abs/1907.05830 which probably amplifies the expected speed-ups. So I suspect that the performance of this PR is still far from the potential gains one could obtain from the full-fledged Celer implementation. Have you tried to run some benchmark against it to assess the remaining difference?

And I have nowhere claimed to have added working sets or dual extrapolation, see PR description. Again, this ~~is (under 200 lines of code)~~ was (now over 500 loc) a pretty neat and small PR, introducing screening rules for a single Cython function that already gives a massive (but not maximum) speedup.

The neat thing is that it is minimal:

Add check of stopping criterion before entering the main loop.
Whenever the stopping criterion is calulcated, we already have the dual gap. Therefore we then screen variables. (This step could later be fine tuned).

I don't mean to block this PR for that because it seems to already provide a net improvement over main, and I am fine with proceeding iteratively, but we should not imply in the doc or changelog that scikit-learn implements all the things described in the Celer paper (and in even less so if you consider the dual extrapolation paper on top).

It nowhere does!

Also: could you update the docstrings and the user guide to reference the gap safe screening rule paper?

This took me a whole day, that's longer than the implementation of the screening rule itself!

ogrisel · 2025-08-11T09:26:02Z

And I have nowhere claimed to have added working sets or dual extrapolation, see PR description.

Sorry, I did not mean to imply that you made that claim in the PR, but I was just pointing that we should not leave the impression that we implement the full Celer algorithm when referencing the Celer paper in the docstring of the estimators (or the user guide).

ogrisel

A few typos, but otherwise, LGTM.

Note: the lockfiles have been concurrently updated this morning hence the conflict.

doc/modules/linear_model.rst

doc/whats_new/upcoming_changes/sklearn.linear_model/31882.enhancement.rst

doc/modules/linear_model.rst

lorentzenchr · 2025-08-11T14:48:01Z

Task list:

remove attribute self._do_screening
merge with main after Cython update MNT update Cython 3.0.10 to 3.1.2 #31905
Solved/Circumvented with 46d3a09 and e71e89b
if ENH/FIX stopping criterion for coordinate descent gap <= tol #31906 gets merged first (preferred way), update to tol <=

doc/modules/linear_model.rst

mathurinm · 2025-08-14T13:28:20Z

doc/modules/linear_model.rst

+Coordinate Descent with Gap Safe Screening Rules
+------------------------------------------------
+
+Coordinate descent (CD) is a strategy so solve the minimization problem that considers a


I know this is the way it is presented in sklearn, but just for completeness, CD is not usually exact minimization along coordiante $j$. It rather corresponds to taking a proximal gradient descent step to approxmiately minimize the function wrt its j-th coordinate.
Here the smooth part of the objective function is quadratic, so this proximal step turns out to exactly minimize the function. But that's a very particular case. CD for logistic regression for example, does not exactly minimize the function along coordiante j.

Thanks for the wider context.

For the historical context - and my education - didn't proximal gradient thinking come later than CD?

doc/modules/linear_model.rst

sklearn/linear_model/_coordinate_descent.py

lorentzenchr · 2025-08-15T14:00:58Z

@mathurinm Thanks for your review. Very valuable!

This reverts commit 53ac9dc.

* default to do_screening=True in enet_path

lorentzenchr · 2025-08-18T07:26:25Z

If CD pipelines are green this is ready for merge.

Note that I removed all the private do_screening things. Only in enet_path one can pass do_screening as private (and undocumented) variable (in params).

…hon 3.1" This reverts commit e71e89b.

lorentzenchr · 2025-08-18T14:47:31Z

Were are back at a neat PR: Without the addition to the user guide the changed lines of code are: +278 -78

lorentzenchr added 2 commits August 5, 2025 18:45

ENH add gap safe screening rule for enet_coordinate_descent

52ed09e

ENH add private _do_screening to Lasso and relatives

36e7dbd

github-actions bot added module:linear_model cython labels Aug 5, 2025

lorentzenchr linked an issue Aug 5, 2025 that may be closed by this pull request

Strong Rules / Gap Safe Screening for Discarding Predictors in Lasso-type Problems #229

Open

lorentzenchr changed the title ~~Gap safe~~ ENH add gap safe screening rules to enet_coordinate_descent Aug 5, 2025

lorentzenchr added the Performance label Aug 5, 2025

ogrisel self-assigned this Aug 6, 2025

ogrisel reviewed Aug 7, 2025

View reviewed changes

lorentzenchr added 3 commits August 7, 2025 16:34

TST decrease tol instead of alpha in test_enet_alpha_max_sample_weight

41e97a6

MNT update Cython 3.0.10 to 3.1.2

53ac9dc

- fixes for fused ctuples

DOC fix docstring example

4a002aa

ogrisel reviewed Aug 8, 2025

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Show resolved Hide resolved

sklearn/linear_model/_cd_fast.pyx Show resolved Hide resolved

This was referenced Aug 8, 2025

Strong Rules / Gap Safe Screening for Discarding Predictors in Lasso-type Problems #229

Open

MNT update Cython 3.0.10 to 3.1.2 #31905

Merged

lorentzenchr added 2 commits August 9, 2025 08:34

DOC user guide entry to for CD and Gap Safe Screening Rules

0b92338

DOC add whatsnew

79818c7

lorentzenchr force-pushed the gap_safe branch from 81e2f1c to 79818c7 Compare August 9, 2025 06:43

TST add test_Cython_solver_equivalence

b5d11cd

lorentzenchr force-pushed the gap_safe branch from 1f34325 to b5d11cd Compare August 9, 2025 15:42

ogrisel approved these changes Aug 11, 2025

View reviewed changes

CLN fix typos

86375d5

lorentzenchr force-pushed the gap_safe branch from 9308e63 to 86375d5 Compare August 11, 2025 13:17

CLN missing declaration of const_ and more modern code comments

2caf671

virchan mentioned this pull request Aug 12, 2025

ENH/FIX stopping criterion for coordinate descent gap <= tol #31906

Merged

DOC whatsnew label enhancement -> efficiency

d0b052e

mathurinm reviewed Aug 14, 2025

View reviewed changes

DOC improve wording based on review

90992be

lorentzenchr added 5 commits August 17, 2025 18:51

Revert "MNT update Cython 3.0.10 to 3.1.2"

46d3a09

This reverts commit 53ac9dc.

MNT struct of 2 float64_t instead of templated ctuples of Cython 3.1

e71e89b

Merge branch 'main' into gap_safe

12d8779

FIX gap <= tol instead of gap < tol

e1cd8f0

CLN remove self._do_screening

47d6bf3

* default to do_screening=True in enet_path

lorentzenchr added 2 commits August 18, 2025 16:10

Merge branch 'main' into gap_safe

189aa22

Revert "MNT struct of 2 float64_t instead of templated ctuples of Cyt…

537b41a

…hon 3.1" This reverts commit e71e89b.

Uh oh!

ENH add gap safe screening rules to enet_coordinate_descent #31882

Are you sure you want to change the base?

ENH add gap safe screening rules to enet_coordinate_descent #31882

Uh oh!

Conversation

lorentzenchr commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lorentzenchr commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A little benchmark on the leukemia dataset

1. Full Lasso Path with 100 steps

2. Single alpha

Uh oh!

ogrisel commented Aug 6, 2025

Uh oh!

ogrisel commented Aug 6, 2025

Uh oh!

ogrisel commented Aug 6, 2025

Uh oh!

lorentzenchr commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Aug 7, 2025

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Aug 11, 2025

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathurinm Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Aug 15, 2025

Uh oh!

lorentzenchr commented Aug 18, 2025

Uh oh!

lorentzenchr commented Aug 5, 2025 •

edited

Loading

github-actions bot commented Aug 5, 2025 •

edited

Loading

lorentzenchr commented Aug 5, 2025 •

edited

Loading

lorentzenchr commented Aug 8, 2025 •

edited

Loading

lorentzenchr commented Aug 11, 2025 •

edited

Loading