-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add trust-ncg
solver to LogisticRegression
#22236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Ruby Werman <rubywerman@berkeley.edu>
- Now graphs using streamlit
@thomasjpfan All other solvers overlap on the graph, since they converge to the same solution. Before mentioning any other metrics or analysis, it's worth qualifying that none of these datasets are sparse-format, even the so-named "sparse"-style dataset -- that refers to the density of useful vs. noisy features. I'll run a second batch of tests on a dense synthetic dataset, and the sparse news dataset. For now, I also observed that two things were almost always true:
During the next phase of benchmarking, I'll use the news group dataset as mentioned prior, as well as generate some memory-usage stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to investigate why trust-ncg
does not coverage. Here is a simple test that should pass:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from numpy.testing import assert_allclose
X, y = make_classification(n_features=10, n_informative=5, random_state=0)
log_reg_trust = LogisticRegression(solver="trust-ncg", fit_intercept=False)
log_reg_trust.fit(X, y)
log_reg_lb = LogisticRegression(solver="lbfgs", fit_intercept=False)
log_reg_lb.fit(X, y)
# Mostly the same as decimal=3 like in test_logistic_regression_solvers
assert_allclose(log_reg_trust.coef_, log_reg_lb.coef_, atol=1.5e-3)
I have not dived deeply into the error, but I see it is triggered here:
I'm also now getting an overflow error in my follow-up benchmark: C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py:107: RuntimeWarning: overflow encountered in multiply
z_next = z + alpha * d
Traceback (most recent call last):
File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 159, in <module>
main()
File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 102, in main
data.extend([build_row(X, y, conf, dset) for conf in CONFIG])
File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 102, in <listcomp>
data.extend([build_row(X, y, conf, dset) for conf in CONFIG])
File "D:\work\scikit-learn\tmp_bench\benchmark_sparsity.py", line 83, in build_row
lr.fit(X, y)
File "d:\work\scikit-learn\sklearn\linear_model\_logistic.py", line 1615, in fit
fold_coefs_ = Parallel(
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 1046, in __call__
while self.dispatch_one_batch(iterator):
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 861, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 779, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\_parallel_backends.py", line 572, in __init__
self.results = batch()
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "d:\work\scikit-learn\sklearn\utils\fixes.py", line 216, in __call__
return self.function(*args, **kwargs)
File "d:\work\scikit-learn\sklearn\linear_model\_logistic.py", line 830, in _logistic_regression_path
opt_res = optimize.minimize(
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_minimize.py", line 641, in minimize
return _minimize_trust_ncg(fun, x0, args, jac, hess, hessp,
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py", line 37, in _minimize_trust_ncg
return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion.py", line 207, in _minimize_trust_region
p, hits_boundary = m.solve(trust_radius)
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py", line 108, in solve
if scipy.linalg.norm(z_next) >= trust_radius:
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\linalg\misc.py", line 145, in norm
a = np.asarray_chkfinite(a)
File "C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\numpy\lib\function_base.py", line 488, in asarray_chkfinite
raise ValueError(
ValueError: array must not contain infs or NaNs I don't yet have any way to minimally reproduce this. The dataset and logistic regression configuration are as follows: X, y = datasets.make_classification(n_samples=15000, n_informative=15, n_classes=15)
X= StandardScaler().fit_transform(X)
lr = LogisticRegression(solver='trust-ncg', multi_class='ovr', penalty='none', max_iter=1000, verbose=1)
lr.fit(X,y) yet this code doesn't produce the same error. I'm a bit puzzled. Regardless, this provides more evidence that there's something wrong in calculating the trust region. Edit: I randomly got this as well, which narrows it a smidge C:\Users\Meekail\Anaconda3\envs\scikit-dev\lib\site-packages\scipy\optimize\_trustregion_ncg.py:106: RuntimeWarning: overflow encountered in double_scalars
alpha = r_squared / dBd |
@Micky774 Could you merge with main? The implementation of the log-loss function has change. I would be interested if the summary remains the same with those changes. |
trust-ncg
solver to LogisticRegression
New benchmarks with this script (run in Jupyter). Tested w/ |
For dense data, it looks like lbfgs is always better than trust-ncg. From #17125, we were trying to improve the situation for sparse data. Is the benchmark different with sparse data? |
@Micky774 Thanks for those benchmarks. In my own tries on benchmarking GLMs, I notices large differences between synthetic and real data. Could you also run timings on the kicks dataset as in #23314 (comment) for lbfgs and trust-ncg (no need for sag or saga)? Also there, one could run on dense and on sparse |
# TODO: remove local LinearModelLoss after renaming `loss` | ||
elif solver == "trust-ncg": | ||
loss_ = LinearModelLoss( | ||
base_loss=HalfBinomialLoss(), fit_intercept=fit_intercept | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local instantiation of LinearModelLoss
is to counteract to a bug caused by the reassignment of loss
to a scalar value later in the code (at the end of the loop). In a separate PR, we ought to rename the loss
referring to the internal loss module, or rename the loss
referring to the scalar loss.
Script for reference @lorentzenchr The time taken by PlotInterestingly, it seems that Plot |
@Micky774 Thanks again for those detailed analysis: much appreciated! Given the results, I guess we can close this PR because unfortunately there is no improvement over already existing solvers. |
It would more informative to re-run this benchmark with various |
Initial results using this script: PlotsDespite it outperforming w/ respect to |
yay!! cheers |
Thanks for running the benchmarks with various tol values. Indeed, it does not seem to be good enough to be worth exposing in the scikit-learn API. |
Reference Issues/PRs
Fixes #17125
Picks up stalled PR #17877
What does this implement/fix? Explain your changes.
PR #17877: Implements trust-ncg option in LogisticRegression when multi_class = 'multinomial'.
This PR: Addresses remaining review comments, including generating benchmarks against the new solver. Reconciles implementation with overhauled loss module implementation. Expands test coverage.
Any other comments?
This picks up the wonderful work done by @rithvikrao and @rubywerman