Robust versions of Linear Regression / Lasso / ElasticNet using LAD (L1-loss) #13612

smarie · 2019-04-10T15:30:01Z

In our team we happened to use L1-loss (LAD, least absolute deviation) linear regression, with or without L1 or L2 regularizers (so the lasso/elasticnet variants). The interest is to have robust linear regression as L1 is much less sensitive to outliers in the data. That solution is supposed to be much faster than for example Ransac that is available in scikit-learn.

I usually rely on general solvers (glpk, clp) for implementation, I have developed one in matlab and python.

Strangely enough there does not seem to be an "official paper" for this method, although it has been known and around for years. I found this tutorial and this paper but that's not much.

Since #5851 opens the topic of robust PCA, I thought that it would be the right timing to open this.

It seems that scikit learn's policy is rather to always implement a dedicated solver for each method, rather than rely on a generic LP/QP solver such as COIN-CLP. Is that correct ? If so we would need to find the correct solver, maybe it is just an adaptation of the one already there in ElasticNet, I did not look yet.

The text was updated successfully, but these errors were encountered:

jnothman · 2019-04-10T22:39:31Z

We tend to avoid adding dependencies beyond scipy. Is there an implementation with a scikit-learn-compatible API outside this library?

smarie · 2019-04-11T09:13:52Z

(answer edited)

It is not clear if your question is about the CLP solver or about the LAD models themselves, I answer for the first option :)

Before going into details, I remind that solving L1-only problems such as LAD Linear regression requires a LP solver, while solving L1 and L2, or L2 alone, problems requires a QP solver.

COIN-OR CLP is a general-purpose LP/QP solver released under Eclipse Public License 1.0. As such it could effectively provide an alternative to most model-specific LP or QP solvers (simplex, gradient descent, and the like) that are here and there in scikit-learn, with possible speed or quality improvements. EDIT after some search I found out that COIN-OR IPOPT is as good as CLP for quadratic problems solving, and much faster for large problems. It has the same license. See this benchmark. And the source for the information.

Direct interfaces

CLP: Pulp and CyLP are provided by the COIN-OR organization themselves but are limited to LP and MILP problems and may require complex installation beforehand since there is no simple way to install their dependency on all platforms
IPOPT: cyipopt provides an interface to IPOPT that is identical to the one in scipy (scipy.optimize.minimize). There is also a more detailed api. It can be installed on all platforms using anaconda. It is maybe the best choice to move forward, although I did not try it yet.
Note that there is also another wrapper with two versions (it was forked) but it seems less mature.

Frameworks

PyOmo: a framework capable of all optimization problem types including LP and QP. Unfortunately CLP is not available directly inside it, only through CBC (so only for LP functionality). But IPOPT is supported for QP and LP solving, through the AMPL interface.

So to make the answer short: no there is no python implementation of CLP doing both QP and LP as of today, but fortunately there is one for IPOPT that also provides a scipy-compliant API.

Another alternative would be to optionally support PyOmo, so as to support various solvers. That'w what I currently use: in the same code I can switch my LAD-Linear Regression implementation from GLPK to CLP to IPOPT, depending on what is available on the platform (in that case, each solver should be installed independently, which is a much more flexible option because users would install the solver of their choice.).

Note that this broader and longer-term topic about a nice integration with alternative solvers should probably be discussed in another issue such as "Include support for generic LP/QP solvers as an alternative", so as to keep the current issue about LAD models only.

agramfort · 2019-04-13T07:31:53Z

you have a LP solver is recent scipy. See https://github.com/flatironinstitute/least_absolute_regression/blob/master/lae_regression/lae_regression/least_abs_err_regression.py#L36 also not that LAD is quantile regression with 0.5 quantile. statsmodels seems to have a solver based on reweighted least square which would be trivial to do with any scipy version (although maybe? slower than using an LP solver)

smarie · 2019-04-13T13:06:33Z

Thanks for the link ! I know that scipy has a LP solver but I was tempted to think that this would be largely less fast and robust than one of the "serious" solvers that people from the optimization domain use. But honestly I did not check nor compare so this comment has not much validity :) I could not find much comparison references, only this open discussion .

Note that using a L1 norm in a general elastic net model would require a QP solver, not a LP.

jeromedockes · 2019-05-10T18:03:29Z

it seems that #9978 will add quantile regression to scikit-learn

lorentzenchr · 2021-05-25T13:37:18Z

Solved by #9978.

lorentzenchr mentioned this issue Dec 13, 2020

META Quantile Regression #18997

Open

4 tasks

lorentzenchr closed this as completed May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robust versions of Linear Regression / Lasso / ElasticNet using LAD (L1-loss) #13612

Robust versions of Linear Regression / Lasso / ElasticNet using LAD (L1-loss) #13612

smarie commented Apr 10, 2019 •

edited

Loading

jnothman commented Apr 10, 2019 via email

smarie commented Apr 11, 2019 •

edited

Loading

agramfort commented Apr 13, 2019 via email

smarie commented Apr 13, 2019

jeromedockes commented May 10, 2019

lorentzenchr commented May 25, 2021

Robust versions of Linear Regression / Lasso / ElasticNet using LAD (L1-loss) #13612

Robust versions of Linear Regression / Lasso / ElasticNet using LAD (L1-loss) #13612

Comments

smarie commented Apr 10, 2019 • edited Loading

jnothman commented Apr 10, 2019 via email

smarie commented Apr 11, 2019 • edited Loading

agramfort commented Apr 13, 2019 via email

smarie commented Apr 13, 2019

jeromedockes commented May 10, 2019

lorentzenchr commented May 25, 2021

smarie commented Apr 10, 2019 •

edited

Loading

smarie commented Apr 11, 2019 •

edited

Loading