Skip to content

[MRG] META Add Generalized Linear Models #9405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 72 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
d5e8810
[WIP] Add Generalized Linear Model, issue #5975, initial commit
Jul 18, 2017
2fc189d
[WIP] Add Generalized Linear Models (#9405)
Jul 19, 2017
a6137d8
[WIP] Add Generalized Linear Models (#9405)
Jul 19, 2017
b0be167
[WIP] Add Generalized Linear Models (#9405)
Aug 9, 2017
85c52ec
[WIP] Add Generalized Linear Models (#9405)
Aug 12, 2017
0f4bdb3
[WIP] Add Generalized Linear Models (#9405)
Sep 18, 2017
5b46c23
[WIP] Add Generalized Linear Models (#9405)
Sep 18, 2017
10dd146
[WIP] Add Generalized Linear Models (#9405)
Dec 3, 2017
72485b6
[WIP] Add Generalized Linear Models (#9405)
Jan 8, 2018
5c1369b
[WIP] Add Generalized Linear Models (#9405)
Jan 24, 2018
91497a2
[WIP] Add Generalized Linear Models (#9405)
Jan 24, 2018
b9e5105
[WIP] Add Generalized Linear Models (#9405)
Jan 24, 2018
e317422
[WIP] Add Generalized Linear Models (#9405)
Jan 25, 2018
9a98184
[WIP] Add Generalized Linear Models (#9405)
Jan 25, 2018
db9defe
[WIP] Add Generalized Linear Models (#9405)
Jan 26, 2018
dc7fdd7
[WIP] Add Generalized Linear Models (#9405)
Jan 26, 2018
b11d06b
[WIP] Add Generalized Linear Models (#9405)
Jan 26, 2018
9e6c013
[WIP] Add Generalized Linear Models (#9405)
Jan 26, 2018
bad0190
[WIP] Add Generalized Linear Models (#9405)
Jan 27, 2018
48137d8
[WIP] Add Generalized Linear Models (#9405)
Jan 28, 2018
2c2a077
[WIP] Add Generalized Linear Models (#9405)
Jan 28, 2018
15931c3
[WIP] Add Generalized Linear Models (#9405)
Jan 28, 2018
feedba3
[MRG] Add Generalized Linear Models (#9405)
Mar 30, 2018
6fdfb47
[MRG] Add Generalized Linear Models (#9405)
Mar 30, 2018
d489f56
[MRG] Add Generalized Linear Models (#9405)
Aug 5, 2018
809e3a2
Remove test_glm_P2_argument
Aug 26, 2018
4edce36
Filter out DeprecationWarning in old versions of scipy.sparse.linalg.…
Aug 30, 2018
46df5b6
import pytest
Aug 30, 2018
21f2136
Document arguments of abstact methods
Aug 30, 2018
1faedf8
Pytest filter warnings use two colons
Aug 30, 2018
992f981
Improve documentation of arguments that were so far undocumented
Aug 30, 2018
06b8451
Further improve documentation of arguments
Aug 30, 2018
c93f60d
Remove parameters docstring for __init__
Aug 31, 2018
66ec63b
Fix typos in docstring of TweedieDistribution
Aug 31, 2018
53c6970
Change docstring section of TweedieDistribution from Attributes to Pa…
Aug 31, 2018
87d5ba3
Minor doc improvements of GeneralizedLinearRegressor
Oct 7, 2018
a9ae023
Double escape in doctring of GeneralizedLinearRegressor
Oct 8, 2018
bb62485
Add example for GeneralizedLinearRegressor
Dec 31, 2018
16d064d
Resolve merge conflicts
Jan 1, 2019
1a02a90
Adapt for minimum numpy version
Jan 1, 2019
177eb4c
Remove six dependencies as in #12639
Jan 6, 2019
3d4c784
Improve user guide, doc and fix penalty parameter for Ridge
Feb 3, 2019
919912c
Smarter intercept initialization and docstring improvements
Feb 17, 2019
01033e3
Fix false formula in starting_mu and improve start_params
Feb 20, 2019
4071a8a
Improve argument handling of P1 and P2
Feb 20, 2019
757bc3c
Fix doctest, test_poisson_enet, change IRLS to use lstsq, fix input c…
Feb 20, 2019
ed8e74f
Use pytest decorators and pytest.raises
Feb 23, 2019
fe876da
Add Logistic regression=Binomial + Logit
Feb 24, 2019
2993e03
More efficient sparse matrices and refactor of irls and cd solver
Apr 7, 2019
a6f9f13
Treat the intercept separately, i.e. X, P1, P2 never include intercept
Apr 20, 2019
c9a7a95
Revised option start_params
Apr 21, 2019
a7755de
Fix a few typos
rth Jun 4, 2019
9aa1fc4
Make module private
rth Jun 4, 2019
ca3eae2
Working on tests
rth Jun 4, 2019
61bc6b8
Improve tests
rth Jun 5, 2019
b24a7ca
Remove unused dec parameter in tests
rth Jun 5, 2019
f95b390
ENH: add Generalized Linear Models, issue #5975
Jul 18, 2017
09176b4
MAINT: merge branch 'GLM-impr' of https://github.com/rth/scikit-learn
Jun 9, 2019
def12ae
[MAINT] make glm private, fix typos, improve tests
Jun 9, 2019
9b574bd
Fix docstrings for the new print_changed_only=True by default
rth Jun 11, 2019
90299fd
Increase coverage
rth Jun 12, 2019
e3a5a9a
More tests and addressing some review comments
rth Jun 12, 2019
54b80b8
TST More specific checks of error messages in tests
rth Jun 13, 2019
e962859
Merge branch 'master' into GLM
rth Jun 27, 2019
7db0320
Add PoissonRegressor alias
rth Jun 14, 2019
dcfe9ed
TST Simplify comparison with ridge
rth Jun 27, 2019
4879bb6
EXA Add plot_tweedie_regression_insurance_claims.py
rth Jun 28, 2019
56069e5
EXA Fix issues with older pandas versions in example
rth Jun 28, 2019
53f3c5f
DOC Add second poisson regression example
rth Jul 9, 2019
e58d8e3
EXA wording and score in plot_tweedie_regression_insurance_claims.html
Jul 14, 2019
c3fc392
Address review comments
rth Jul 15, 2019
98054bc
fix sparse P2 cases
Nov 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -727,6 +727,7 @@ Kernels:
linear_model.BayesianRidge
linear_model.ElasticNet
linear_model.ElasticNetCV
linear_model.GeneralizedLinearRegressor
linear_model.HuberRegressor
linear_model.Lars
linear_model.LarsCV
Expand Down
129 changes: 126 additions & 3 deletions doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Ordinary Least Squares Complexity

The least squares solution is computed using the singular value
decomposition of X. If X is a matrix of shape `(n_samples, n_features)`
this method has a cost of
this method has a cost of
:math:`O(n_{\text{samples}} n_{\text{features}}^2)`, assuming that
:math:`n_{\text{samples}} \geq n_{\text{features}}`.

Expand Down Expand Up @@ -430,7 +430,7 @@ between the features.

The advantages of LARS are:

- It is numerically efficient in contexts where the number of features
- It is numerically efficient in contexts where the number of features
is significantly greater than the number of samples.

- It is computationally just as fast as forward selection and has
Expand Down Expand Up @@ -732,7 +732,7 @@ classifier. In this model, the probabilities describing the possible outcomes
of a single trial are modeled using a
`logistic function <https://en.wikipedia.org/wiki/Logistic_function>`_.

Logistic regression is implemented in :class:`LogisticRegression`.
Logistic regression is implemented in :class:`LogisticRegression`.
This implementation can fit binary, One-vs-Rest, or multinomial logistic
regression with optional :math:`\ell_1`, :math:`\ell_2` or Elastic-Net
regularization.
Expand Down Expand Up @@ -888,6 +888,129 @@ to warm-starting (see :term:`Glossary <warm_start>`).
.. [9] `"Performance Evaluation of Lbfgs vs other solvers"
<http://www.fuzihao.org/blog/2016/01/16/Comparison-of-Gradient-Descent-Stochastic-Gradient-Descent-and-L-BFGS/>`_

.. _Generalized_linear_regression:

Generalized Linear Regression
=============================

:class:`GeneralizedLinearRegressor` generalizes the :ref:`elastic_net` in two
ways [10]_. First, the predicted values :math:`\hat{y}` are linked to a linear
combination of the input variables :math:`X` via an inverse link function
:math:`h` as

.. math:: \hat{y}(w, x) = h(xw) = h(w_0 + w_1 x_1 + ... + w_p x_p).

Secondly, the squared loss function is replaced by the deviance :math:`D` of an
exponential dispersion model (EDM) [11]_. The objective function being minimized
becomes

.. math:: \frac{1}{2\mathrm{sum}(s)}D(y, \hat{y}; s) + \alpha \rho ||P_1w||_1
+\frac{\alpha(1-\rho)}{2} w^T P_2 w

with sample weights :math:`s`.
:math:`P_1` (diagonal matrix) can be used to exclude some of the coefficients in
the L1 penalty, the matrix :math:`P_2` (must be positive semi-definite) allows
for a more versatile L2 penalty.

Use cases, where a loss different from the squared loss might be appropriate,
are the following:

* If the target values :math:`y` are counts (non-negative integer valued) or
frequencies (non-negative), you might use a Poisson deviance with log-link.

* If the target values are positive valued and skewed, you might try a
Gamma deviance with log-link.

* If the target values seem to be heavier tailed than a Gamma distribution,
you might try an Inverse Gaussian deviance (or even higher variance powers
of the Tweedie family).

Since the linear predictor :math:`Xw` can be negative and
Poisson, Gamma and Inverse Gaussian distributions don't support negative values,
it is convenient to apply a link function different from the identity link
:math:`h(Xw)=Xw` that guarantees the non-negativeness, e.g. the log-link with
:math:`h(Xw)=\exp(Xw)`.

Note that the feature matrix `X` should be standardized before fitting. This
ensures that the penalty treats features equally. The estimator can be used as
follows:

>>> from sklearn.linear_model import GeneralizedLinearRegressor
>>> reg = GeneralizedLinearRegressor(alpha=0.5, family='poisson', link='log')
>>> reg.fit([[0, 0], [0, 1], [2, 2]], [0, 1, 2])
GeneralizedLinearRegressor(alpha=0.5, family='poisson', link='log')
>>> reg.coef_
array([0.24630169, 0.43373464])
>>> reg.intercept_
-0.76383633...


.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_linear_model_plot_poisson_spline_regression.py`

Mathematical formulation
------------------------

In the unpenalized case, the assumptions are the following:

* The target values :math:`y_i` are realizations of random variables
:math:`Y_i \overset{i.i.d}{\sim} \mathrm{EDM}(\mu_i, \frac{\phi}{s_i})`
with expectation :math:`\mu_i=\mathrm{E}[Y]`, dispersion parameter
:math:`\phi` and sample weights :math:`s_i`.
* The aim is to predict the expectation :math:`\mu_i` with
:math:`\hat{y_i} = h(\eta_i)`, linear predictor
:math:`\eta_i=(Xw)_i` and inverse link function :math:`h(\eta)`.

Note that the first assumption implies
:math:`\mathrm{Var}[Y_i]=\frac{\phi}{s_i} v(\mu_i)` with unit variance
function :math:`v(\mu)`. Specifying a particular distribution of an EDM is the
same as specifying a unit variance function (they are one-to-one).

Including penalties helps to avoid overfitting or, in case of L1 penalty, to
obtain sparse solutions. But there are also other motivations to include them,
e.g. accounting for the dependence structure of :math:`y`.

The objective function, which is independent of :math:`\phi`, is minimized with
respect to the coefficients :math:`w`.

The deviance is defined by the log of the :math:`\mathrm{EDM}(\mu, \phi)`
likelihood as

.. math:: d(y, \mu) = -2\phi\cdot
\left(loglike(y,\mu,\phi)
- loglike(y,y,\phi)\right) \\
D(y, \mu; s) = \sum_i s_i \cdot d(y_i, \mu_i)

===================================== =============================== ================================= ============================================
Distribution Target Domain Variance Function :math:`v(\mu)` Unit Deviance :math:`d(y, \mu)`
===================================== =============================== ================================= ============================================
Normal ("normal") :math:`y \in (-\infty, \infty)` :math:`1` :math:`(y-\mu)^2`
Poisson ("poisson") :math:`y \in [0, \infty)` :math:`\mu` :math:`2(y\log\frac{y}{\mu}-y+\mu)`
Gamma ("gamma") :math:`y \in (0, \infty)` :math:`\mu^2` :math:`2(\log\frac{\mu}{y}+\frac{y}{\mu}-1)`
Inverse Gaussian ("inverse.gaussian") :math:`y \in (0, \infty)` :math:`\mu^3` :math:`\frac{(y-\mu)^2}{y\mu^2}`
===================================== =============================== ================================= ============================================

Two remarks:

* The deviances for at least Normal, Poisson and Gamma distributions are
strictly consistent scoring functions for the mean :math:`\mu`, see Eq.
(19)-(20) in [12]_.

* If you want to model a frequency, i.e. counts per exposure (time, volume, ...)
you can do so by a Poisson distribution and passing
:math:`y=\frac{\mathrm{counts}}{\mathrm{exposure}}` as target values together
with :math:`s=\mathrm{exposure}` as sample weights.


.. topic:: References:

.. [10] McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN 0-412-31760-5.

.. [11] Jørgensen, B. (1992). The theory of exponential dispersion models and analysis of deviance. Monografias de matemática, no. 51.
See also `Exponential dispersion model. <https://en.wikipedia.org/wiki/Exponential_dispersion_model>`_

.. [12] Gneiting, T. (2010). `Making and Evaluating Point Forecasts. <https://arxiv.org/pdf/0912.0902.pdf>`_

Stochastic Gradient Descent - SGD
=================================
Expand Down
Loading