ENH Add `trust-ncg` solver to `LogisticRegression` #22236

Micky774 · 2022-01-17T18:54:05Z

Reference Issues/PRs
Fixes #17125
Picks up stalled PR #17877

What does this implement/fix? Explain your changes.
PR #17877: Implements trust-ncg option in LogisticRegression when multi_class = 'multinomial'.
This PR: Addresses remaining review comments, including generating benchmarks against the new solver. Reconciles implementation with overhauled loss module implementation. Expands test coverage.

Any other comments?
This picks up the wonderful work done by @rithvikrao and @rubywerman

Co-authored-by: Ruby Werman <rubywerman@berkeley.edu>

- Now graphs using streamlit

Micky774 · 2022-01-27T18:36:55Z

@thomasjpfan
Okay so it looks like in all of my initial tests, using datasets created via datasets.make_classification(...) on varying arguments, trust-ncg never converges to the same results as the other solvers -- if it even converges in the first place. I haven't explicitly calculated how many times it does vs does not converge yet. This lack of convergence was validated by comparing coef_ values, but is also readily seen in graphs of the NLL. See here:

All other solvers overlap on the graph, since they converge to the same solution. Before mentioning any other metrics or analysis, it's worth qualifying that none of these datasets are sparse-format, even the so-named "sparse"-style dataset -- that refers to the density of useful vs. noisy features. I'll run a second batch of tests on a dense synthetic dataset, and the sparse news dataset.

For now, I also observed that two things were almost always true:

trust-ncg failed to converge because "...a bad approximation caused failure to predict improvement..."
trust-ncg num_iter_<=10 after fit

During the next phase of benchmarking, I'll use the news group dataset as mentioned prior, as well as generate some memory-usage stats.

Micky774 · 2022-01-27T19:53:59Z

@thomasjpfan @ogrisel

thomasjpfan

We need to investigate why trust-ncg does not coverage. Here is a simple test that should pass:

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from numpy.testing import assert_allclose

X, y = make_classification(n_features=10, n_informative=5, random_state=0)
log_reg_trust = LogisticRegression(solver="trust-ncg", fit_intercept=False)
log_reg_trust.fit(X, y)

log_reg_lb = LogisticRegression(solver="lbfgs", fit_intercept=False)
log_reg_lb.fit(X, y)

# Mostly the same as decimal=3 like in test_logistic_regression_solvers
assert_allclose(log_reg_trust.coef_, log_reg_lb.coef_, atol=1.5e-3)

I have not dived deeply into the error, but I see it is triggered here:

https://github.com/scipy/scipy/blob/4871f3d1c61bdb296ae03e3480f5f584f5c67256/scipy/optimize/_trustregion.py#L239-L241

tmp_bench/benchmark.py

Micky774 · 2022-01-28T21:04:51Z

I'm also now getting an overflow error in my follow-up benchmark:

C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py:107: RuntimeWarning: overflow encountered in multiply
  z_next = z + alpha * d
Traceback (most recent call last):
  File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 159, in <module>
    main()
  File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 102, in main
    data.extend([build_row(X, y, conf, dset) for conf in CONFIG])
  File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 102, in <listcomp>
    data.extend([build_row(X, y, conf, dset) for conf in CONFIG])
  File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 83, in build_row
    lr.fit(X, y)
  File "d:\work\scikit-learn\sklearn\linear_model\_logistic.py", line 1615, in fit
    fold_coefs_ = Parallel(
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 1046, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "d:\work\scikit-learn\sklearn\utils\fixes.py", line 216, in __call__
    return self.function(*args, **kwargs)
  File "d:\work\scikit-learn\sklearn\linear_model\_logistic.py", line 830, in _logistic_regression_path
    opt_res = optimize.minimize(
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_minimize.py", line 641, in minimize
    return _minimize_trust_ncg(fun, x0, args, jac, hess, hessp,
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py", line 37, in _minimize_trust_ncg
    return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion.py", line 207, in _minimize_trust_region
    p, hits_boundary = m.solve(trust_radius)
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py", line 108, in solve
    if scipy.linalg.norm(z_next) >= trust_radius:
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\linalg\misc.py", line 145, in norm
    a = np.asarray_chkfinite(a)
  File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\numpy\lib\function_base.py", line 488, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

I don't yet have any way to minimally reproduce this. The dataset and logistic regression configuration are as follows:

X, y = datasets.make_classification(n_samples=15000, n_informative=15, n_classes=15)
X= StandardScaler().fit_transform(X)
lr = LogisticRegression(solver='trust-ncg', multi_class='ovr', penalty='none', max_iter=1000, verbose=1)
lr.fit(X,y)

yet this code doesn't produce the same error. I'm a bit puzzled. Regardless, this provides more evidence that there's something wrong in calculating the trust region.

Edit: I randomly got this as well, which narrows it a smidge

C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py:106: RuntimeWarning: overflow encountered in double_scalars
  alpha = r_squared / dBd

lorentzenchr · 2022-05-13T12:06:32Z

@Micky774 Could you merge with main? The implementation of the log-loss function has change. I would be interested if the summary remains the same with those changes.

Micky774 · 2022-06-08T23:18:01Z

New benchmarks with this script (run in Jupyter). Tested w/ X arrays of shape (n_samples, 100) and y of shape (n_samples, ) ranging over 4 classes. All data are dense. Repeated 20 times.

thomasjpfan · 2022-06-09T01:07:32Z

For dense data, it looks like lbfgs is always better than trust-ncg. From #17125, we were trying to improve the situation for sparse data. Is the benchmark different with sparse data?

Micky774 · 2022-06-09T01:14:44Z

Additional benchmarks w/ sparse X w/ density=0.1:

Overall, it seems lbfgs > trust-ncg? Haven't tested memory footprint yet, will probably do so tomorrow.

lorentzenchr · 2022-06-09T06:49:54Z

@Micky774 Thanks for those benchmarks. In my own tries on benchmarking GLMs, I notices large differences between synthetic and real data. Could you also run timings on the kicks dataset as in #23314 (comment) for lbfgs and trust-ncg (no need for sag or saga)? Also there, one could run on dense and on sparse X.

Micky774 · 2022-06-09T13:35:33Z

sklearn/linear_model/_logistic.py

+        # TODO: remove local LinearModelLoss after renaming `loss`
+        elif solver == "trust-ncg":
+            loss_ = LinearModelLoss(
+                base_loss=HalfBinomialLoss(), fit_intercept=fit_intercept
+            )


The local instantiation of LinearModelLoss is to counteract to a bug caused by the reassignment of loss to a scalar value later in the code (at the end of the loop). In a separate PR, we ought to rename the loss referring to the internal loss module, or rename the loss referring to the scalar loss.

Micky774 · 2022-06-09T15:00:36Z

Script for reference

@lorentzenchr The time taken by trust-ncg is pretty significantly worse than for lbfgs:

Plot

Interestingly, it seems that trust-ncg does outperform lbfgs when multi_class=multinomial and penalty="l2"

Plot

lorentzenchr · 2022-06-09T15:20:37Z

@Micky774 Thanks again for those detailed analysis: much appreciated! Given the results, I guess we can close this PR because unfortunately there is no improvement over already existing solvers.

ogrisel · 2022-06-13T18:19:43Z

It would more informative to re-run this benchmark with various tol values because they do not necessarily mean the same thing for different solvers. It would also be interesting to try on the same dataset as in:

#23314 (comment)

Micky774 · 2022-06-20T18:49:28Z

It would more informative to re-run this benchmark with various tol values because they do not necessarily mean the same thing for different solvers. It would also be interesting to try on the same dataset as in:

#23314 (comment)

Initial results using this script:

Plots

Iterations vs suboptimality

Train time vs suboptimality

Despite it outperforming w/ respect to n_iter, it falls behind significantly when measuring w.r.t. train time. It also apparently converges to a worse solution when measuring w/ the HalfBinomial loss, however all of them apparently converge to the same NLL value, so I'm not sure how dramatic that is?

rubywerman · 2022-06-22T20:46:21Z

yay!! cheers

ogrisel · 2022-06-27T12:37:38Z

Thanks for running the benchmarks with various tol values. Indeed, it does not seem to be good enough to be worth exposing in the scikit-learn API.

rithvikrao and others added 12 commits July 9, 2020 15:09

Initial commit

70ecf59

Co-authored-by: Ruby Werman <rubywerman@berkeley.edu>

Remove extraneous print statement

1ad6650

Merge branch 'master' into logistic

ba513a0

remove trust-krylov solver

03381b3

remove trust-krylov solver

9a0e192

revert changes on this file, not finished with it yet

a9e6208

add hess suggestions

0559dd1

change hessp parameter

ea103e1

change hessp signature

6b27b1d

add trust-ncg to table

b4a016d

add use case for trust-ncg

8702ed7

Merge branch 'main' into logistic

9c5adbb

github-actions bot added module:linear_model module:utils labels Jan 17, 2022

Micky774 added 9 commits January 22, 2022 16:14

Merge branch 'main' into logistic

e3205c7

Merge branch 'main' into logistic

d750f54

Updated cocs for _logistic.py

d29e0f4

Merge branch 'main' into logistic

f624b78

Added tmp files for establishing benchmark

4a7b9f1

Changed graph style

781d53c

Merge remote-tracking branch 'upstream/main' into logistic

36ef601

Added saved benchmark results

559ca3f

Updated benchmark file w/ validation capability

f77de69

- Now graphs using streamlit

Added title to graphs

d4b60b4

thomasjpfan reviewed Jan 27, 2022

View reviewed changes

tmp_bench/benchmark.py Outdated Show resolved Hide resolved

Micky774 added 2 commits January 28, 2022 14:35

Merge branch 'main' into logistic

098fabd

Added new benchmark file

9a4de1a

ogrisel mentioned this pull request May 13, 2022

FEA add Cholesky based Newton solver to GLMs #23314

Closed

Micky774 added 7 commits May 24, 2022 12:01

Merge branch 'main' into logistic

0b5a209

Fixed typo

2b8114f

Tried to reconcile loss update, and improved tests

b62821c

Updated trust-ncg to use scalar loss w/o gradient

98f3970

Added broken version for debugging

3b058ec

Fixed implementation of trust-ncg

c36f628

Add trust-ncg to solver equality tests

87876ad

Micky774 changed the title ~~[WIP] ENH Add trust-ncg option to LogisticRegression~~ ENH Add trust-ncg solver to LogisticRegression Jun 8, 2022

Micky774 added 2 commits June 8, 2022 18:11

Removed old dataframe files

0969a67

Merge branch 'main' into logistic

c19244a

Removed old benchmark files and expanded tests

bf66c88

Added changelog entry

cb8d256

Updated comments over current hack

3711628

Micky774 commented Jun 9, 2022

View reviewed changes

Merge branch 'main' into logistic

f9e1939

Merge branch 'main' into logistic

0da5868

Micky774 closed this Jun 22, 2022

lorentzenchr mentioned this pull request Jun 22, 2022

[MRG] ENH Add trust-ncg option to LogisticRegression #17877

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add `trust-ncg` solver to `LogisticRegression` #22236

ENH Add `trust-ncg` solver to `LogisticRegression` #22236

Micky774 commented Jan 17, 2022 •

edited

Loading

Micky774 commented Jan 27, 2022

Micky774 commented Jan 27, 2022

thomasjpfan left a comment •

edited

Loading

Micky774 commented Jan 28, 2022 •

edited

Loading

lorentzenchr commented May 13, 2022

Micky774 commented Jun 8, 2022 •

edited

Loading

thomasjpfan commented Jun 9, 2022

Micky774 commented Jun 9, 2022

lorentzenchr commented Jun 9, 2022

Micky774 Jun 9, 2022

Micky774 commented Jun 9, 2022

lorentzenchr commented Jun 9, 2022

ogrisel commented Jun 13, 2022

Micky774 commented Jun 20, 2022 •

edited

Loading

rubywerman commented Jun 22, 2022

ogrisel commented Jun 27, 2022

ENH Add trust-ncg solver to LogisticRegression #22236

ENH Add trust-ncg solver to LogisticRegression #22236

Conversation

Micky774 commented Jan 17, 2022 • edited Loading

Micky774 commented Jan 27, 2022

Micky774 commented Jan 27, 2022

thomasjpfan left a comment • edited Loading

Choose a reason for hiding this comment

Micky774 commented Jan 28, 2022 • edited Loading

lorentzenchr commented May 13, 2022

Micky774 commented Jun 8, 2022 • edited Loading

thomasjpfan commented Jun 9, 2022

Micky774 commented Jun 9, 2022

lorentzenchr commented Jun 9, 2022

Micky774 Jun 9, 2022

Choose a reason for hiding this comment

Micky774 commented Jun 9, 2022

Plot

Plot

lorentzenchr commented Jun 9, 2022

ogrisel commented Jun 13, 2022

Micky774 commented Jun 20, 2022 • edited Loading

Plots

rubywerman commented Jun 22, 2022

ogrisel commented Jun 27, 2022

ENH Add `trust-ncg` solver to `LogisticRegression` #22236

ENH Add `trust-ncg` solver to `LogisticRegression` #22236

Micky774 commented Jan 17, 2022 •

edited

Loading

thomasjpfan left a comment •

edited

Loading

Micky774 commented Jan 28, 2022 •

edited

Loading

Micky774 commented Jun 8, 2022 •

edited

Loading

Micky774 commented Jun 20, 2022 •

edited

Loading