-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Robust versions of Linear Regression / Lasso / ElasticNet using LAD (L1-loss) #13612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We tend to avoid adding dependencies beyond scipy. Is there an
implementation with a scikit-learn-compatible API outside this library?
|
(answer edited) It is not clear if your question is about the CLP solver or about the LAD models themselves, I answer for the first option :) Before going into details, I remind that solving L1-only problems such as LAD Linear regression requires a LP solver, while solving L1 and L2, or L2 alone, problems requires a QP solver. COIN-OR CLP is a general-purpose LP/QP solver released under Eclipse Public License 1.0. As such it could effectively provide an alternative to most model-specific LP or QP solvers (simplex, gradient descent, and the like) that are here and there in scikit-learn, with possible speed or quality improvements. EDIT after some search I found out that COIN-OR IPOPT is as good as CLP for quadratic problems solving, and much faster for large problems. It has the same license. See this benchmark. And the source for the information. Direct interfaces
Frameworks PyOmo: a framework capable of all optimization problem types including LP and QP. Unfortunately CLP is not available directly inside it, only through CBC (so only for LP functionality). But IPOPT is supported for QP and LP solving, through the AMPL interface. So to make the answer short: no there is no python implementation of CLP doing both QP and LP as of today, but fortunately there is one for IPOPT that also provides a scipy-compliant API. Another alternative would be to optionally support PyOmo, so as to support various solvers. That'w what I currently use: in the same code I can switch my LAD-Linear Regression implementation from GLPK to CLP to IPOPT, depending on what is available on the platform (in that case, each solver should be installed independently, which is a much more flexible option because users would install the solver of their choice.). Note that this broader and longer-term topic about a nice integration with alternative solvers should probably be discussed in another issue such as "Include support for generic LP/QP solvers as an alternative", so as to keep the current issue about LAD models only. |
you have a LP solver is recent scipy.
See
https://github.com/flatironinstitute/least_absolute_regression/blob/master/lae_regression/lae_regression/least_abs_err_regression.py#L36
also not that LAD is quantile regression with 0.5 quantile. statsmodels
seems to have a solver based on reweighted least square which would be
trivial to do with any scipy version (although maybe? slower than using an
LP solver)
|
Thanks for the link ! I know that scipy has a LP solver but I was tempted to think that this would be largely less fast and robust than one of the "serious" solvers that people from the optimization domain use. But honestly I did not check nor compare so this comment has not much validity :) I could not find much comparison references, only this open discussion . Note that using a L1 norm in a general elastic net model would require a QP solver, not a LP. |
it seems that #9978 will add quantile regression to scikit-learn |
Solved by #9978. |
In our team we happened to use L1-loss (LAD, least absolute deviation) linear regression, with or without L1 or L2 regularizers (so the lasso/elasticnet variants). The interest is to have robust linear regression as L1 is much less sensitive to outliers in the data. That solution is supposed to be much faster than for example
Ransac
that is available in scikit-learn.I usually rely on general solvers (glpk, clp) for implementation, I have developed one in matlab and python.
Strangely enough there does not seem to be an "official paper" for this method, although it has been known and around for years. I found this tutorial and this paper but that's not much.
Since #5851 opens the topic of robust PCA, I thought that it would be the right timing to open this.
It seems that scikit learn's policy is rather to always implement a dedicated solver for each method, rather than rely on a generic LP/QP solver such as COIN-CLP. Is that correct ? If so we would need to find the correct solver, maybe it is just an adaptation of the one already there in
ElasticNet
, I did not look yet.The text was updated successfully, but these errors were encountered: