ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_descent #31387

lorentzenchr · 2025-05-19T06:40:28Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR removes the unnecessary updates of R_sum=np.sum(residuals), because it does not change by a coordinate update if X_mean is provided, i.e., np.sum(X[:, j] - X_mean[j]) equals 0.

Any other comments?

Should improve runtime performance of Lasso and ElasticNet for sparse input X a bit.

…cent * R_sum is np.sum(residual) and won't change by a coordinate upate if X_mean is provided.

github-actions · 2025-05-19T06:41:25Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: aaeebc1. Link to the linter CI: here}

OmarManzoor

LGTM. Thanks @lorentzenchr

sklearn/linear_model/_cd_fast.pyx

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

ogrisel

The code is quite low level, it's not easy to check for math invariants. So I wrote this script to empirically validate the claims of the PR.

# %%
import scipy.sparse as sp
import numpy as np
from sklearn.linear_model import ElasticNet
from time import perf_counter


rng = np.random.default_rng(42)
X = rng.uniform(size=(1_000, 100_000))
X[X < 0.9] = 0.0  # Sparsify the matrix
X_sparse = sp.csc_array(X)

w_true = np.zeros(X.shape[1])
w_true[0:300] = 10.0

assert X.mean(axis=0)[0:5].min() > 0.01

y = X_sparse @ w_true + 1 + rng.normal(size=X.shape[0]) * 0.1
# %%

reg = ElasticNet(
    alpha=0.1, l1_ratio=0.5, fit_intercept=True, selection="random", random_state=42
)

coef_dense = reg.fit(X, y).coef_
intercept_dense = reg.intercept_
print("10 first dense coefficients:", coef_dense[:10])
print("dense intercept:", intercept_dense)
# %%
for i in range(5):
    tic = perf_counter()
    coef_sparse = reg.fit(X_sparse, y).coef_
    toc = perf_counter()
    print(f"Time for sparse fit: {toc - tic:.3f} seconds")

intercept_sparse = reg.intercept_
print("10 first sparse coefficients:", coef_sparse[:10])
print("sparse intercept:", intercept_sparse)
assert np.allclose(coef_dense, coef_sparse)
assert np.allclose(intercept_dense, intercept_sparse)

The results are:

both main and this branch yields the same coefficients/intercept as expected, and the assertions always pass (they use the dense implementation as a reference that is not touched by this PR).
the sparse fit time is approximately 3x faster with this optimization on this data (it strongly depends on the sparsity level).

Good job @lorentzenchr!

/cc @agramfort @mathurinm.

lorentzenchr · 2025-05-27T18:06:42Z

@ogrisel Thanks for the benchmark and review. I would not have guessed the size of the improvement 😇. The equivalence of different solvers is already checked in a test😉

…cent (scikit-learn#31387) Co-authored-by: Omar Salman <omar.salman2007@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

…cent (#31387) Co-authored-by: Omar Salman <omar.salman2007@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_des…

39396fc

…cent * R_sum is np.sum(residual) and won't change by a coordinate upate if X_mean is provided.

github-actions bot added module:linear_model cython labels May 19, 2025

DOC add whatsnew

d38e542

lorentzenchr added Performance Quick Review For PRs that are quick to review labels May 19, 2025

OmarManzoor approved these changes May 22, 2025

View reviewed changes

sklearn/linear_model/_cd_fast.pyx Outdated Show resolved Hide resolved

lorentzenchr and others added 2 commits May 22, 2025 09:29

CLN typo

7be74bc

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

Merge branch 'main' into cd_fuitile_R_sum

aaeebc1

OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label May 22, 2025

ogrisel approved these changes May 27, 2025

View reviewed changes

ogrisel merged commit cbe8648 into scikit-learn:main May 27, 2025
40 checks passed

lorentzenchr added a commit to lorentzenchr/scikit-learn that referenced this pull request May 27, 2025

DOC tiny whatsnew update for PR scikit-learn#31387

d114526

lorentzenchr deleted the cd_fuitile_R_sum branch May 27, 2025 20:38

lorentzenchr mentioned this pull request May 27, 2025

DOC tiny whatsnew update for PR #31387 #31437

Merged

ogrisel pushed a commit that referenced this pull request May 28, 2025

DOC tiny whatsnew update for PR #31387 (#31437)

334523f

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request May 30, 2025

DOC tiny whatsnew update for PR scikit-learn#31387 (scikit-learn#31437)

121d62e

elhambbi pushed a commit to elhambbi/scikit-learn that referenced this pull request Jun 1, 2025

DOC tiny whatsnew update for PR scikit-learn#31387 (scikit-learn#31437)

22c37a1

jeremiedbb pushed a commit that referenced this pull request Jun 5, 2025

ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_des…

188187c

…cent (#31387) Co-authored-by: Omar Salman <omar.salman2007@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

jeremiedbb pushed a commit that referenced this pull request Jun 5, 2025

DOC tiny whatsnew update for PR #31387 (#31437)

fa9b4fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_descent #31387

ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_descent #31387

Uh oh!

lorentzenchr commented May 19, 2025

Uh oh!

github-actions bot commented May 19, 2025 •

edited

Loading

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

lorentzenchr commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_descent #31387

ENH avoid futile recomputation of R_sum in sparse_enet_coordinate_descent #31387

Uh oh!

Conversation

lorentzenchr commented May 19, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentzenchr commented May 27, 2025

Uh oh!

Uh oh!

github-actions bot commented May 19, 2025 •

edited

Loading