[MRG+1] Expose SAGA solver for ElasticNet regression #12907 #12966

lorentzenchr · 2019-01-12T17:15:31Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

A new argument 'solver' is introduced in ElasticNet. In addition to the default 'cd' one can also choose 'saga' to use the SAGA solver, see also Ridge and LogisticRegression.
The same tests as for coordinate descent are done for saga.

Any other comments?

Open question: So far ElasticNet via cd does not support sample_weight. It could be included via SAGA. Should this go in this PR or rather in a new issue?

lorentzenchr · 2019-01-12T23:36:13Z

There is a test failure in test_enet_toy in 2 of 4 settings of travis that I can't reproduce on my machine. So I'm a bit stuck. Can someone please help to figure this out in order to make the tests pass?

jnothman · 2019-01-14T01:48:35Z

The failing settings use latest numpy/scipy versions

jnothman · 2019-01-14T02:10:19Z

sklearn/linear_model/coordinate_descent.py

+        Algorithm to use in the optimization problem.
+
+        - 'auto' chooses the solver automatically based on the type of data.
+          If the data is F-contiguous or a sparse 'csc' matrix, it chooses


This seems quite an obscure criterion... Is the memory order of the data really sensible criteria to say one solver is more appropriate than the other?

If you have a large dataset X, reordering from C to F or vice versa with a full copy of the array X might be a heavy (memory) and unwanted operation. The nice point is that 'cd' needs F- and 'saga' needs C-ordered arrays. Furthermore, without benchmarks, we can't tell which solver is better suited for a given X, so I came up with this criterion. I can easily change 'auto' to 'cd' or remove 'auto'. It is just a suggestion.

I'd be inclined to remove auto, but would be interested to hear from @agramfort

alternatively, those benchmarks we don't have yet, would be an argument for auto. Would you be very nice and run some benchmarks on this @lorentzenchr ? It may turn out to be the case that the copy is not that significant in most cases compared to the rest of the method afterall.

+1 for removing auto for now but I would be very interested in some extensive benchmarks to guide future decisions on "smart" solver selection.

Benchmarks based on bench_glmnet.py indicate that coordinate descent is the clear winner, regardless of the contiguity of the data. So, auto will be removed and cd the default solver.

lorentzenchr · 2019-01-16T22:54:33Z

Even with the same versions of python, numpy and scipy, I'm not able to reproduce the test failure on travis for test_enet_toy on my machine. I don't know what to do. Sorry.

lorentzenchr · 2019-01-20T16:11:06Z

I think that I figured out why the test on travis fails. See issue #13021.

sklearn/linear_model/coordinate_descent.py

sklearn/linear_model/tests/test_coordinate_descent.py

lorentzenchr · 2019-01-26T12:03:05Z

With a new solver in ElasticNet, the current file structure does not make sense to me. If I were to do it from scratch (without backward incompatibility issues), I would separate optimization algos=solvers from the actual models=API=estimators. That is, I would put the class ElasticNet in a file like enet_regression.py (and in the future maybe also Lasso, ElasticNetCV, ..).

What do you think?

adrinjalali · 2019-01-26T15:34:47Z

With a new solver in ElasticNet, the current file structure does not make sense to me. If I were to do it from scratch (without backward incompatibility issues), I would separate optimization algos=solvers from the actual models=API=estimators. That is, I would put the class ElasticNet in a file like enet_regression.py (and in the future maybe also Lasso, ElasticNetCV, ..).

I guess we can only make such changes once we're clear on #12927.

lorentzenchr · 2019-02-06T20:56:16Z

@adrinjalali I hope I resolved your review comments in your way.
I'd like to address the question again, whether or not this PR should be used to include sample_weight in ElasticNet as SAGA supports them.

adrinjalali · 2019-02-06T21:04:27Z

sklearn/linear_model/coordinate_descent.py

        if isinstance(self.precompute, str):
            raise ValueError('precompute should be one of True, False or'
                             ' array-like. Got %r' % self.precompute)
+        elif self.precompute is not False and self.solver == 'saga':


I don't think it's wrong, but self.precomute != False (or not self.precompute if None should be considered the same here) seems more natural to me.

self.precompute != False gives "E712 comparison to False should be 'if cond is not false' or ...".
I personally find if not self.precompute less readable than the if self.precompute is not False.

sklearn/linear_model/coordinate_descent.py

adrinjalali · 2019-02-06T21:09:38Z

I'd like to address the question again, whether or not this PR should be used to include sample_weight in ElasticNet as SAGA supports them.

I'd suggest having a separate PR for that.

Please also add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

jnothman

Not yet reviewed tests

jnothman · 2019-02-07T04:33:32Z

sklearn/linear_model/coordinate_descent.py

@@ -560,6 +561,7 @@ class ElasticNet(LinearModel, RegressorMixin):
        on an estimator with ``normalize=False``.

    precompute : True | False | array-like
+        relevant only if ``solver='cd'``


Please start with uppercase and end with .

In HTML the line break will disappear

Do you want to keep the line break in HTML?

sklearn/linear_model/coordinate_descent.py

* new argument 'solver' in ElasticNet * check argument 'l1_ratio' to be in valid range * add same tests for saga as for coordinate descent

jnothman

Yes, I think that structure is cleaner...

One nitpick. Nice work!

sklearn/linear_model/coordinate_descent.py

lorentzenchr · 2019-02-20T20:20:00Z

@adrinjalali With your +1 this would be ready for merge. At least, I hope so 😏

ogrisel · 2019-02-21T09:21:19Z

@lorentzenchr thanks for your work. Could you please post a screenshot of your benchmark results? Aren't there any cases where SAGA is faster than CD (e.g. with a large number of samples)?

ogrisel

Overall this PR looks nice but I would be more confident in the results if we could check the dual gap for the saga solver in the tests. See more details below:

sklearn/linear_model/coordinate_descent.py

ogrisel · 2019-02-21T09:37:45Z

sklearn/linear_model/coordinate_descent.py

+        For ``solver='cd'``, the number of iterations run by the solver to
+        reach the specified tolerance.
+        For ``solver='saga'``, the number of full passes on all samples until
+        convergence.


I realize that the dual_gap_ attribute is not mentioned here. This is an oversight. This should be fixed.

It never has been. I can add it in this PR or open a new issue. What do you prefer?

ogrisel · 2019-02-21T09:43:17Z

sklearn/linear_model/coordinate_descent.py

+                    is_saga=True)
+                coef_[k] = this_coef
+                self.n_iter_.append(this_iter)
+                self.dual_gap_ = None


I think we should compute the dual_gap_ after the final SAGA iteration (not necessarily use it as a convergence criterion for SAGA itself) but more as a final check to make it possible for the user that the two solvers can find a solution with similar quality.

I don't really know for sure what is the meaning for tol in the context of saga. @TomDLT @arthurmensch @agramfort do know if we could provide any kind of guarantee w.r.t. the dual gap when using the natural stopping criterion of the sag_solver?

From what I see, saga uses the same convergence criteria but without the additional check for the dual gap. Compare

scikit-learn/sklearn/linear_model/sag_fast.pyx

Line 501 in 7bfed17

if ((max_weight != 0 and max_change / max_weight <= tol)

scikit-learn/sklearn/linear_model/cd_fast.pyx

Lines 206 to 207 in 7bfed17

if (w_max == 0.0 or

d_w_max / w_max < d_w_tol or

ogrisel · 2019-02-21T11:02:59Z

I ran a couple of benchmarks with make_regression(n_samples=int(1e6), n_features=100, n_informative=5) and while the CD and SAGA solver yield similar solutions as expected, SAGA is still significantly slower. CD can converge in just 3 iterations on this dataset while SAGA needs at least 10.

Interestingly, SGDRegressor with early stopping becomes competitive in that regime (3 iteration as well) although I suspect that the dual gap would not be the same :).

SGDRegressor(l1_ratio=0.5, penalty="elasticnet", max_iter=1000, tol=1e-6,
             n_iter_no_change=1, alpha=1).fit(X, y)

It does not pick up the same non-zero features as SAGA and CD but there are also only 5 non-zeros (because of correlation I guess there is redundancy) and the cross-validation accuracy of SGDRegressor is the same or even slightly higher.

ogrisel · 2019-02-21T11:12:15Z

Note that in that large sample regime, make_pipeline(SelectFdr(f_regression), LinearRegression()) find the same supports, perfect 1.0 r2 score on the validation set and runs in 1s...

And the selected support from SelectFdr(f_regression) matches the one by ElasticNet.

arthurmensch · 2019-02-22T14:06:24Z

Will review this during the sprint

ogrisel · 2019-02-22T15:03:01Z

But more importantly is there really a reason to expose the saga solver in ElasticNet if the CD solver is always better?

lorentzenchr · 2019-02-24T22:00:53Z

I did some more benchmarks:

For increasing number of features I get similar results. When I set the size of training data larger than my memory, saga started to be faster.
One reason to include saga could be to add sample_weight, but this could also be added to cd via rescaling of the data and would be another PR anyway.

agramfort · 2019-02-25T08:44:03Z

could we reuse some of the benchmarks here:

https://github.com/scikit-learn/scikit-learn/tree/master/benchmarks

to see in which setting it helps.

lorentzenchr · 2019-02-25T18:48:25Z

I did my benchmarks based on https://github.com/scikit-learn/scikit-learn/blob/master/benchmarks/bench_glmnet.py. If it helps, I could provide my adaptation. Also, I couldn't find a single (in memory) case where saga is faster. So @ogrisel has a good point for questioning to expose saga at all.

ogrisel · 2019-02-28T23:33:22Z

It's possible that our saga solver is not optimal (e.g. we do not do any minibatching / importance sampling based on gradient magnitude...). But for now I don't feel like merging this: I see no value in exposing choices that are always found to be empirically suboptimal w.r.t. existing methods already implemented in the library.

fabianp · 2019-03-01T12:16:27Z

I'm not too surprised of this since CD can do many optimizations in the case of a squared loss, while the saga solver is basically the same than the one for logistic regression.

AFAIKT the benchmarks are on dense data. A big usercase where SAGA shines is when the input data is very sparse (rcv1, url dataset, etc.)

ogrisel · 2019-03-08T17:06:29Z

Thanks for your input @fabianp. @lorentzenchr I see you closed the issue, it might still be interesting to run some benchmarks with the SAGA solver on some medium scale sparse text regression problem: e.g. predictive movie review scores from TF-IDF / bag-of-words features.

lorentzenchr · 2019-03-28T18:59:14Z

Also for sparse (csr) data, I get similar results as the benchmark on dense data above. I don't dare showing the numbers. Conclusion: CD seems to be a factor 10 faster for elastic net regression (squared error).

jnothman reviewed Jan 14, 2019

View reviewed changes

lorentzenchr changed the title ~~[WIP] Expose SAGA solver for ElasticNet regression #12907~~ [MRG] Expose SAGA solver for ElasticNet regression #12907 Jan 23, 2019

adrinjalali reviewed Jan 23, 2019

View reviewed changes

sklearn/linear_model/coordinate_descent.py Outdated Show resolved Hide resolved

adrinjalali reviewed Jan 23, 2019

View reviewed changes

sklearn/linear_model/coordinate_descent.py Outdated Show resolved Hide resolved

adrinjalali reviewed Jan 23, 2019

View reviewed changes

sklearn/linear_model/coordinate_descent.py Outdated Show resolved Hide resolved

adrinjalali reviewed Jan 23, 2019

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Outdated Show resolved Hide resolved

adrinjalali reviewed Jan 23, 2019

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Outdated Show resolved Hide resolved

adrinjalali reviewed Jan 23, 2019

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Outdated Show resolved Hide resolved

lorentzenchr force-pushed the saga2enet branch from e3ae1fa to d0169b0 Compare January 26, 2019 12:17

adrinjalali reviewed Feb 6, 2019

View reviewed changes

sklearn/linear_model/coordinate_descent.py Outdated Show resolved Hide resolved

jnothman reviewed Feb 7, 2019

View reviewed changes

Christian Lorentzen added 10 commits February 7, 2019 21:00

Expose SAGA solver for ElasticNet regression scikit-learn#12907

5f87bc6

* new argument 'solver' in ElasticNet * check argument 'l1_ratio' to be in valid range * add same tests for saga as for coordinate descent

Fix flake8 and higher precision for ElasticNet tests

0eb7d22

Solve test issue for saga solver, see scikit-learn#13021

547224b

Fix example in ElasticNet docstring and fix typos

280dfd8

Restore order of arguments in ElasticNet

ddfb40e

Remove option solver=auto in ElasticNet

faa4fcd

Use pytest.raises and @pytest.mark.parametrize in tests for ElasticNet

9afa103

Raise UserWarning for saga with precompute in ElasticNet

ae43c5b

Replace precompute warning for saga in ElasticNet

0d19d1a

Minor doc improvements

8d9b1c2

Christian Lorentzen added 2 commits February 7, 2019 21:00

Remove unnecessary assignments to None in ElasticNet

19d4b2b

Updated whatsnew for 0.21

ae63892

lorentzenchr force-pushed the saga2enet branch from e648a90 to ae63892 Compare February 7, 2019 20:05

Christian Lorentzen added 2 commits February 7, 2019 22:51

Fix flake8 E117 over-indented

cc6df2c

Refactor, i.e. split solvers cd and saga in ElasticNet

8818bc0

jnothman approved these changes Feb 10, 2019

View reviewed changes

sklearn/linear_model/coordinate_descent.py Outdated Show resolved Hide resolved

Minor code style improvements

7866525

lorentzenchr changed the title ~~[MRG] Expose SAGA solver for ElasticNet regression #12907~~ [MRG+1] Expose SAGA solver for ElasticNet regression #12907 Feb 11, 2019

ogrisel reviewed Feb 21, 2019

View reviewed changes

versionadded 0.21

adcd364

lorentzenchr closed this Mar 8, 2019

lorentzenchr mentioned this pull request Mar 29, 2019

Expose SAGA solver for ElasiticNet #12907

Closed

lorentzenchr mentioned this pull request Feb 5, 2020

[MRG] Sample weights for ElasticNet #15436

Merged

lorentzenchr deleted the saga2enet branch October 25, 2022 14:12

Uh oh!

[MRG+1] Expose SAGA solver for ElasticNet regression #12907 #12966

[MRG+1] Expose SAGA solver for ElasticNet regression #12907 #12966

Uh oh!

Conversation

lorentzenchr commented Jan 12, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lorentzenchr commented Jan 12, 2019

Uh oh!

jnothman commented Jan 14, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Jan 16, 2019

Uh oh!

lorentzenchr commented Jan 20, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Jan 26, 2019

Uh oh!

adrinjalali commented Jan 26, 2019

Uh oh!

lorentzenchr commented Feb 6, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrinjalali commented Feb 6, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentzenchr commented Feb 20, 2019

Uh oh!

ogrisel commented Feb 21, 2019

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Feb 21, 2019 •

edited

Loading

ogrisel commented Feb 21, 2019 •

edited

Loading