[RFC] Improve the Ridge solver speed / convergence tradeoff with better solver defaults

I did some follow-up work for the discussions on potential problems for `Ridge(normalize=False)` in #19426 and #17444 in the following gist:

https://gist.github.com/ogrisel/cf7ac8bca6725ec02d4cc8d03ce096a7

This gist compares the results between a minimal implementation of ridge with LBFGS  by @agramfort  against scikit-learn's `Ridge(normalize=False, solver="cholesky")`, `Ridge(normalize=False, solver="lsqr")`, `Ridge(normalize=False, solver="svd")` and other solvers.

The main conclusion is that those models seems to converge to the same solution, if we set a low value of `tol` and if we are not too strict on the tolerance: see how I defined atol in the gist. I cannot get convergence to machine precision (only around 1e-6/1e-7 in float64), but maybe this is expected. I am not so sure: I tried to tweak `pgtol` on top of `factr` but it did not improve.

On large-ish problems, L-BFGS can be significantly faster than our the always accurate svd solver (but not always, depends on the regularization) but never faster than the `lsqr` (LAPACK-based) solver. The cholesky solver (our default for dense data) is sometimes faster or slower than lsqr and L-BFGS (depending on regularization) but always faster than SVD (at least with high regularization). The sag/saga solvers are not necessarily as fast as they could be: I noticed that they do no benefit from multi-threading on a multicore machine, so that might be the reason. I have never seen then run faster than lsqr. sag/saga can be significantly faster than L-BFGS despite using fewer cores on problems with `n_samples=5e3` / `n_features=1e3 `and not too much regularization (alpha=1e-6).

There are still problems with `normalize=True` as reported in #17444 but maybe we don't care so much because we are going to deprecate this option, see #17772 by @maikia that I plan to review soon.

Proposed plan of actions:

- [x]  proceed with the deprecation of `normalize=True` in #17772 and <del>therefore put  #17444 on hold (or less of a priority)</del> actually no, see: #19616.
- [x] improve the ridge tests in scikit-learn by explicitly evaluating the min function for different solvers and maybe using a utility such as the `check_min` of my gist.
   Tight tests (but without checking the minimum explicitly) #22910
- [ ] maybe add the ridge_bfgs function as a reference to compare against in our test suite? Since it does not seem to always converge to a very high quality optimum, I am not so sure of the value.
- [ ] maybe we could conduct a more systematic evaluation of various datasets with different shapes, conditioning and regularization to select  a better default than "cholesky" for dense data? "cholesky" is reputed to not be numerically stable, although I did not observe problems in my gist, maybe because of regularization. `lsqr` seems to be very promising, both in terms of speed and accuracy of the solution (according to my `check_min` thingy).
- [x] I am not sure that our default value of `tol=1e-3` is safe. It's probably very solver dependent, but I doubt that it's a good idea for the fast `lsqr` solver for instance (to be benchmarked if we do the above evaluation). https://github.com/scikit-learn/scikit-learn/pull/24465

Other ideas:

- maybe we should consider adding l-bfgs an option? That would require doing more evaluation on sparse data / ill-conditioned data with various values of alpha. Not sure of the benefit compared to `lsqr` or `cholesky` though: they seem to always be more accurate than l-bfgs and faster (especially `lsqr`). And l-bfgs  can be very slow when alpha is small.
- and maybe add a QR-based solver that is supposed to be [more numerically stable](http://www.math.iit.edu/~fass/477577_Chapter_5.pdf)? or is this what LAPACK is doing internally when we use the `lsqr` solver?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] Improve the Ridge solver speed / convergence tradeoff with better solver defaults #19615

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC] Improve the Ridge solver speed / convergence tradeoff with better solver defaults #19615

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions