ENH Loss module LogisticRegression #21808

lorentzenchr · 2021-11-28T14:19:42Z

Reference Issues/PRs

Follow-up of #20567.

What does this implement/fix? Explain your changes.

This PR replaces the losses of LogisticRegression by the new common loss module of #20567.

Any other comments?

It implements LinearLoss in linear_model._linear_loss.py as helper-wrapper around loss functions such that the raw_prediction = X @ coef is taken into account. Ideally, this class can be used for the GLMs like PoissonRegressor, too.

This is similar to #19089, but replaces losses only for LogisticRegression.

lorentzenchr · 2021-12-01T20:51:59Z

Benchmark of Fit Time of LogisticRegression lbfgs and newton-cg

Edit: Updated 17.12.2021

N=n_samples, n_features=50
Error bars from 10 runs each.
Full code: https://github.com/lorentzenchr/notebooks/blob/master/bench_loss_module_logistic_and_hgbt.ipynb

Hardware: Intel Core i7-8559U, 8th generation, 16 GB RAM
Software: Python 3.7.9, numpy 1.21.4, scipy 1.7.3
This PR (based on dea9bf0) and master (d09e1d7) both compiled the same way.

Results

n_threads=6 set via OMP_NUM_THREADS=6 (hardware maximum is 8).

Summary

Within error bounds, this PR only improves fit times for n_threads=1.
Note that improved fit times is not the (primary) goal of this PR.

Binary Logistic Regression (left plots)

Using multiple cores adds a runtime penalty that starts to be beneficial for larger sample sizes.

Multiclass/Multinomial Logistic Regression: (right plots)

Same as binary. Note that memory consumption should be lower as this PR uses label encoded target (aka ordinal encoding) while master uses binarized label encoding (aka one-hot encoding).

lorentzenchr · 2021-12-08T07:36:54Z

@agramfort @rth You might be interested.

agramfort

@lorentzenchr can you point me to the lines that do the magic here?

sklearn/linear_model/_linear_loss.py

lorentzenchr

Some comments on important code segments.

sklearn/linear_model/_linear_loss.py

sklearn/linear_model/_logistic.py

lorentzenchr · 2021-12-09T12:38:34Z

sklearn/linear_model/_logistic.py

+        if solver in ["lbfgs", "newton-cg"] and len(classes_) == 1:
+            n_threads = _openmp_effective_n_threads()


Maybe, this should also depend on n_jobs, e.g. n_threads = min(n_jobs, _openmp_effective_n_threads()).

We decided to restrict n_jobs to control joblib-level parallelism in other estimators (KMeans, HGBDT). So for consistency I would stick to that policy for now. To control the number of OpenMP (and possibly also OpenBLAS threads) users can rely on threadpoolctl which is a dependency of scikit-learn.

However, I suspect that the iterated calls to OpenMP cython code and OpenBLAS code can lead to performance slow downs when OpenBLAS using its own native threading layer instead of the same OpenMP runtime as scikit-learn. This can be confirmed by comparing the performance of this PR in 2 environments:

one that installs scipy from PyPI that uses an OpenBLAS with its own threadpool (independent of the threadpool used by OpenMP)

another env where scipy's openmp as explained here: Poor performance of sklearn.cluster.KMeans for numpy >= 1.19.0 #20642 (comment)

Profiling on a MNIST shaped problem shows that this does not seem a problem in practice. Maybe this is because the 2 OpenBLAS calls dominate the Cython calls that are comparatively negligible.

I only tried on macOS though. Maybe the OpenBLAS/OpenMP story is different on Linux.

What is the conclusion? No code change needed?

I only tried on macOS though. Maybe the OpenBLAS/OpenMP story is different on Linux.

@ogrisel: I can experiment with that, do you have your script at hand? If not, no problem.

No code change needed?

I don't think code change is required now to change the way Cython/prange/OpenMP threading is used in this PR but it might be important to keep this potential problem in mind if we observe slowe than expected performance on data with different shapes (e.g. smaller data size maybe where the interacting threadpool problem might still be present or in other locations in the scikit-learn code base whenever with have the following code pattern:

while not converged: step 1: Cython prange loop that uses an OpenMP threads step 2: numpy matrix multiply / scipy.linalg BLAS open that uses OpenBLAS threads

The long term fix for this kind of problems would ideally happen at the community level in upstream packages (scipy/scipy#15129) but if ever we need to workaround it for a specific in scikit-learn before that happens, we can always decide to use threadpoolctl (or just passing num_threads=1 to prange) to disable threading either for step 1 or step 2 when we detect that OpenBLAS is packaged in a way that it relies on its native threadpool instead of sharing the OpenMP threadpool.

OpenMathLib/OpenBLAS#3187 relates to the issue @ogrisel mentions regarding the following code pattern.

Actually, looking at this again, and independently of the OpenBLAS / OpenMP threadpool interaction hypothetical problem, there might be problematic case when n_jobs > 1 because we could get over-subscription between joblib parallelism and openmp parallelism. Maybe we should always hardcode n_threads=1 to stay on the same side for this PR.

We might want to add a TODO comment to say that we should refactor this all to avoid using joblib parallelism entirely when doing binary and multinomial multiclass classification and only use joblib only for for the one-vs-rest multiclass case. This is probably beyond the scope of this PR.

Done in dea9bf0 (sorry for the stupid commit message).

sklearn/linear_model/_linear_loss.py

lorentzenchr · 2021-12-09T12:50:28Z

sklearn/linear_model/_linear_loss.py

+            # Here we may safely assume HalfMultinomialLoss aka categorical
+            # cross-entropy.
+            # HalfMultinomialLoss computes only the diagonal part of the hessian, i.e.
+            # diagonal in the classes. Here, we want the matrix-vector product of the
+            # full hessian. Therefore, we call gradient_proba.


This is really special for multinomial case / n_classes >= 3. The hessian returned by HalfMultinomialLoss is only the diagonal part, diagonal in classes. Therefore, we call the method gradient_proba which exists only for HalfMultinomialLoss for exactly this reason: to define hessp such that it computes products of a vector with the full hessian for which gradient and predicted probabilties are enough.

ogrisel

Here is a first round of feedback. I did some profiling on macOS with MNIST

Here is the flamegraph generated by py-spy:

The two numpy matrix multiplications that involve X clearly dominate the duration of one iteration (as seen in the line numbers of the flamegraph above: _linear_loss.py line 91 and 182).

The performance is very similar to that of main. The Cython code is almost invisible compared to the numpy matrix multiplications.

Here is the benchmark script I used:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from joblib import Memory
from sklearn.preprocessing import scale
from time import perf_counter


# Cache in memory to avoid profiling data generation
m = Memory(location="/tmp/joblib")
make_classification = m.cache(make_classification)

# MNIST-sized dataset
X, y = make_classification(
    n_samples=60_000, n_features=784, n_informative=500, n_classes=10,
    random_state=0
)
X = scale(X)

print("Fitting logistic regression...")
tic = perf_counter()
clf = LogisticRegression(max_iter=300).fit(X, y)
toc = perf_counter()
print(f"Fitting took {toc - tic:.3f} seconds ({clf.n_iter_} iterations)")

You can use py-spy or viztracer to profile it to interactively explore the reports.

doc/whats_new/v1.1.rst

sklearn/linear_model/_linear_loss.py

ogrisel · 2021-12-15T10:14:16Z

sklearn/linear_model/_linear_loss.py

+
+        The intercept term is at the end of the coef array:
+        if loss.is_multiclass:
+            coef[n_features::n_dof] = coef[(n_dof-1)::n_dof]


Suggested change

coef[n_features::n_dof] = coef[(n_dof-1)::n_dof]

intercept = coef[n_features::n_dof] = coef[(n_dof-1)::n_dof]

intercept.shape = (n_classes,)

This notation implies that all intercept values are contiguously packed at the end of the flat coef array. However in the code we have

w = coef.reshape(self._loss.n_classes, -1) if self.fit_intercept: intercept = w[:, -1] w = w[:, :-1]

which means that if coef is contiguuous, neither w is not contiguous nor intercept are. Furthermore the intercept values are not packaged at the end of coef anymore

I tried to evaluate the performance impact of using non contiguous coef on a MNIST shaped problem and at least on my machine it does not have a significant impact:

In [1]: import numpy as np In [2]: n_samples, n_features, n_classes = 60000, 784, 10 In [3]: X = np.random.randn(n_samples, n_features) In [5]: flat_coef = np.random.normal(size=(n_classes * (n_features + 1))) In [6]: coef = flat_coef.reshape(n_classes, -1)[:, :-1] In [7]: intercept = flat_coef.reshape(n_classes, -1)[:, -1] In [8]: coef.flags Out[8]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False In [9]: intercept.flags Out[9]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False In [10]: coef_contig = np.ascontiguousarray(coef) In [11]: intercept_contig = np.ascontiguousarray(intercept) In [12]: %timeit X @ coef.T + intercept 54.8 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [13]: %timeit X @ coef_contig.T + intercept_contig 55.5 ms ± 2.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Still, I would feel more comfortable we'd shaped the problem to use contiguous arrays everywhere.

You are right! D*m*d.

The scipy solvers only understand 1-d arrays x to solve for, i.e. minimize(func, x, ...). The x in logistic regression is initialized as

w0 = np.zeros( (classes.size, n_features + int(fit_intercept)), order="F", dtype=X.dtype )

Then this is ravelled for lbfgs and newton-cg:

w0 = w0.ravel()

w0 is now (C- and F-) contiguous. Then in _w_intercept_raw

w = coef.reshape(self._loss.n_classes, -1)

This is surprisingly C-contiguous.

Here is a full example on its own:

import numpy as np n_classes, n_features = 3,5 w0 = np.zeros((n_classes, n_features), order="F") w1 = w0.ravel() w1.flags # C_CONTIGUOUS : True # F_CONTIGUOUS : True w2 = w1.reshape(n_classes, -1) w2.flags # C_CONTIGUOUS : True # F_CONTIGUOUS : False w3 = w2[:, :-1] w3.flags # C_CONTIGUOUS : False # F_CONTIGUOUS : False

Possible solution
Should we use np.ravel(..., order="F") and np.reshape(..., order='F')?

w0 = np.zeros(
(classes.size, n_features + int(fit_intercept)), order="F", dtype=X.dtype
)

Oh I missed the order="F" here! So my analysis was flawed, I think.

Maybe we should rename w0 to packed_coef when it is rectangular with the fortran layout as is the case above and then use the view flat_coef = np.ravel(packed_coef) for solvers that need to 1d view representation.

Then we can have a centralized documentation somewhere to explain the memory layout and contiguity assumptions for packed_coef and flat_coef and how to extract the contiguous views for the coef and intercept sub-components throughout this code.

39ffe9d is my attempt as solving this. It is a larger change:

coef.shape is either (n_dof) or (n_classes, n_dof) or ravelled (n_classes * n_dof,)
In the multiclass case, this enables to handle both, ravelled and unravelled coefeficients at the same time.

In all steps involving ravel or reshape, the order is set explicitly, mostly "F".

The benchmark script above gives 1.821 seconds in this PR vs 2.070 seconds on master.

sklearn/linear_model/_linear_loss.py

ogrisel · 2021-12-15T14:19:15Z

sklearn/linear_model/_logistic.py

+        if solver in ["lbfgs", "newton-cg"] and len(classes_) == 1:
+            n_threads = _openmp_effective_n_threads()


We decided to restrict n_jobs to control joblib-level parallelism in other estimators (KMeans, HGBDT). So for consistency I would stick to that policy for now. To control the number of OpenMP (and possibly also OpenBLAS threads) users can rely on threadpoolctl which is a dependency of scikit-learn.

However, I suspect that the iterated calls to OpenMP cython code and OpenBLAS code can lead to performance slow downs when OpenBLAS using its own native threading layer instead of the same OpenMP runtime as scikit-learn. This can be confirmed by comparing the performance of this PR in 2 environments:

one that installs scipy from PyPI that uses an OpenBLAS with its own threadpool (independent of the threadpool used by OpenMP)

another env where scipy's openmp as explained here: Poor performance of sklearn.cluster.KMeans for numpy >= 1.19.0 #20642 (comment)

sklearn/linear_model/_linear_loss.py

ogrisel · 2021-12-15T17:33:52Z

sklearn/linear_model/_logistic.py

+        if solver in ["lbfgs", "newton-cg"] and len(classes_) == 1:
+            n_threads = _openmp_effective_n_threads()


Profiling on a MNIST shaped problem shows that this does not seem a problem in practice. Maybe this is because the 2 OpenBLAS calls dominate the Cython calls that are comparatively negligible.

I only tried on macOS though. Maybe the OpenBLAS/OpenMP story is different on Linux.

…r/scikit-learn into loss_module_logistic

lorentzenchr · 2022-01-22T16:24:41Z

@rth @TomDLT might be interested in this PR.

sklearn/linear_model/_linear_loss.py

agramfort

otherwise LGTM

sklearn/linear_model/tests/test_linear_loss.py

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

…r/scikit-learn into loss_module_logistic

lorentzenchr · 2022-02-09T20:14:14Z

@agramfort Can you read minds from far distances? After about 1.5 years in the making of losses, this week, I got a bit frustrated - for no particular reason. Now, there are 2 approvals, a first linear model to profit from it and a path for further 2nd order solvers 🥳

agramfort · 2022-02-09T20:20:01Z

agramfort · 2022-02-14T10:55:39Z

Thx @lorentzenchr !

* ENH replace loss in linear logistic regression * MNT remove logistic regression's own loss functions * CLN remove comment * DOC add whatsnew * DOC more precise whatsnew * CLN restore improvements of scikit-learn#19571 * ENH improve fit time by separating mat-vec in multiclass * DOC update whatsnew * not only a bit ;-) * DOC note memory benefit for multiclass case * trigger CI * trigger CI * CLN rename variable to hess_prod * DOC address reviewer comments * CLN remove C/F for 1d arrays * CLN rename to gradient_per_sample * CLN rename alpha to l2_reg_strength * ENH respect F-contiguity * TST fix sag tests * CLN rename to LinearModelLoss * CLN improve comments according to review * CLN liblinear comment * TST add / move test to test_linear_loss.py * CLN comment placement * trigger CI * CLN add comment about contiguity of raw_prediction * DEBUG debian-32 * DEBUG test only linear_model module * Revert "DEBUG test only linear_model module" This reverts commit 9d6e698. * DEBUG test -k LogisticRegression * Revert "DEBUG test -k LogisticRegression" This reverts commit c203167. * Revert "DEBUG debian-32" This reverts commit ef0b98f. * DEBUG set n_jobs=1 * Revert "DEBUG set n_jobs=1" This reverts commit c7f6f72. * CLN always use n_threads=1 * CLN address review * ENH avoid array copy * CLN simplify L2 norm * CLN rename w to weights * CLN rename to hessian_sum and hx_sum * CLN address review * CLN rename to init arg and attribute to base_loss * CLN apply review suggestion Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> * CLN base_loss instead of _loss attribute Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

lorentzenchr added 3 commits November 28, 2021 15:02

ENH replace loss in linear logistic regression

bf2b0cf

MNT remove logistic regression's own loss functions

f74bdde

CLN remove comment

5596ab2

github-actions bot added the module:linear_model label Nov 28, 2021

DOC add whatsnew

b6a96ce

lorentzenchr mentioned this pull request Nov 28, 2021

[WIP] Loss Module: replace in logistic regression and hgbt #19089

Closed

DOC more precise whatsnew

9803d8a

lorentzenchr added the Waiting for Reviewer label Nov 28, 2021

CLN restore improvements of scikit-learn#19571

c9e211f

lorentzenchr changed the title ~~[MRG] ENH Loss module LogisticRegression~~ [WIP] ENH Loss module LogisticRegression Nov 28, 2021

lorentzenchr mentioned this pull request Nov 29, 2021

ENH log1pexp for binomial loss in loss module #21814

Merged

lorentzenchr added 2 commits November 30, 2021 18:54

Merge branch 'main' into loss_module_logistic

68b1d11

ENH improve fit time by separating mat-vec in multiclass

d36c17b

lorentzenchr added 3 commits December 1, 2021 21:54

DOC update whatsnew

faa16ec

not only a bit ;-)

e4d4432

DOC note memory benefit for multiclass case

3deafab

lorentzenchr changed the title ~~[WIP] ENH Loss module LogisticRegression~~ [MRG] ENH Loss module LogisticRegression Dec 1, 2021

lorentzenchr added 3 commits December 4, 2021 14:15

trigger CI

9d10861

trigger CI

3dba3f8

Merge branch 'main' into loss_module_logistic

a8b2a27

agramfort reviewed Dec 9, 2021

View reviewed changes

sklearn/linear_model/_linear_loss.py Outdated Show resolved Hide resolved

sklearn/linear_model/_linear_loss.py Outdated Show resolved Hide resolved

lorentzenchr commented Dec 9, 2021

View reviewed changes

CLN rename variable to hess_prod

e588e33

ogrisel mentioned this pull request Dec 15, 2021

Improve the mathematical description of Logistic Regression #21985

Closed

Merge branch 'main' into loss_module_logistic

f1c189d

lorentzenchr requested a review from jjerphan December 15, 2021 17:25

ogrisel reviewed Dec 15, 2021

View reviewed changes

DOC address reviewer comments

3104648

lorentzenchr added 2 commits January 12, 2022 23:14

Merge branch 'main' into loss_module_logistic

8e5e319

Merge branch 'loss_module_logistic' of https://github.com/lorentzench…

4631688

…r/scikit-learn into loss_module_logistic

lorentzenchr mentioned this pull request Jan 27, 2022

Add Lasso regularization to PoissonRegressor #22289

Closed

TomDLT reviewed Jan 28, 2022

View reviewed changes

sklearn/linear_model/_linear_loss.py Outdated Show resolved Hide resolved

CLN rename to init arg and attribute to base_loss

4e8681b

agramfort reviewed Feb 9, 2022

View reviewed changes

sklearn/linear_model/tests/test_linear_loss.py Outdated Show resolved Hide resolved

CLN apply review suggestion

ee066bb

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

agramfort approved these changes Feb 9, 2022

View reviewed changes

lorentzenchr added 3 commits February 9, 2022 19:41

Merge branch 'main' into loss_module_logistic

4ffac56

Merge branch 'loss_module_logistic' of https://github.com/lorentzench…

c28a630

…r/scikit-learn into loss_module_logistic

CLN base_loss instead of _loss attribute

190c264

lorentzenchr changed the title ~~[MRG] ENH Loss module LogisticRegression~~ ENH Loss module LogisticRegression Feb 9, 2022

agramfort merged commit d8d5637 into scikit-learn:main Feb 14, 2022

lorentzenchr mentioned this pull request Feb 19, 2022

ENH migrate GLMs / TweedieRegressor to linear loss #22548

Merged

lorentzenchr deleted the loss_module_logistic branch February 19, 2022 20:33

ogrisel mentioned this pull request Feb 24, 2022

Patching scikit-learn dev version fails with: ImportError: cannot import name '_logistic_loss_and_grad' from 'sklearn.linear_model._logistic' uxlfoundation/scikit-learn-intelex#979

Closed

lorentzenchr mentioned this pull request Feb 25, 2022

A common private module for differentiable loss functions used as objective functions in estimators #15123

Closed

lorentzenchr mentioned this pull request Mar 30, 2022

Performance improvements in linear models scikit-learn/communication#12

Open

lorentzenchr mentioned this pull request Apr 9, 2022

ENH save memory with LinearLoss #23090

Closed

raphaelvallat mentioned this pull request Jun 13, 2022

LogisticRegression fails in sklearn 1.1.1 with newton-cg solver when X only contains one predictor #23605

Closed

TomDLT mentioned this pull request Jun 14, 2022

FIX logistic regression with newton_cg solver, a single feature, and an intercept #23608

Merged

naoise-h mentioned this pull request Sep 8, 2022

[ENH] Add random state support to diffprivlib IBM/differential-privacy-library#72

Merged

lorentzenchr added Performance Numerical Stability and removed Waiting for Reviewer labels May 22, 2025

		if solver in ["lbfgs", "newton-cg"] and len(classes_) == 1:
		n_threads = _openmp_effective_n_threads()

	coef[n_features::n_dof] = coef[(n_dof-1)::n_dof]
	intercept = coef[n_features::n_dof] = coef[(n_dof-1)::n_dof]
	intercept.shape = (n_classes,)

Uh oh!

ENH Loss module LogisticRegression #21808

ENH Loss module LogisticRegression #21808

Uh oh!

Conversation

lorentzenchr commented Nov 28, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lorentzenchr commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark of Fit Time of LogisticRegression lbfgs and newton-cg

Results

Summary

Binary Logistic Regression (left plots)

Multiclass/Multinomial Logistic Regression: (right plots)

Uh oh!

lorentzenchr commented Dec 8, 2021

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentzenchr commented Dec 1, 2021 •

edited

Loading

ogrisel Dec 15, 2021 •

edited

Loading

ogrisel Dec 16, 2021 •

edited

Loading

ogrisel Dec 16, 2021 •

edited

Loading

ogrisel left a comment •

edited

Loading

lorentzenchr Dec 15, 2021 •

edited

Loading

ogrisel Dec 15, 2021 •

edited

Loading