[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176

ogrisel · 2019-10-10T17:11:40Z

Fixes #10003. This PR builds on top of #14300.

I plan to rebase the last 2 commits on top of #14300 from time to time.

TODO:

Add tests
Add documentation
Add a helper plot_lorenz_curve function?

…commit

* Fixed pep8 * Fixed flake8 * Rename GeneralizedLinearModel as GeneralizedLinearRegressor * Use of six.with_metaclass * PEP257: summary should be on same line as quotes * Docstring of class GeneralizedLinearRegressor: \ before mu * Arguments family and link accept strings * Use of ConvergenceWarning

* GeneralizedLinearRegressor added to doc/modules/classes.rst

* fixed bug: init parameter max_iter * fix API for family and link: default parameter changed to string non public variables self._family_instance and self._link_instance * fixed bug in score, minus sign forgotten * added check_is_fitted to estimate_phi and score * added check_array(X) in predict * replaced lambda functions in TweedieDistribution * some documentation

* make raw docstrings where appropriate * make ExponentialDispersionModel (i.e. TweedieDistribution) pickable: ExponentialDispersionModel has new properties include_lower_bound, method in_y_range is not abstract anymore. * set self.intercept_=0 if fit_intercept=False, such that it is always defined. * set score to D2, a generalized R2 with deviance instead of squared error, as does glmnet. This also solves issues with check_regressors_train(GeneralizedLinearRegressor), which assumes R2 score. * change of names: weight to weights in ExponentialDispersionModel and to sample_weight in GeneralizedLinearRegressor * add class method linear_predictor

* added L2 penalty * api change: alpha, l1_ratio, P1, P2, warm_start, check_input, copy_X * added entry in user guide * improved docstrings * helper function _irls_step

* fix some bugs in user guide linear_model.rst * fix some pep8 issues in test_glm.py

* added test: ridge poisson with log-link compared to glmnet * fix ValueError message for l1_ratio * fix ValueError message for P2 * string comparison: use '==' and '!=' instead of 'is' and 'is not' * fix RuntimeWarnings in unit_deviance of poisson: x*log(x) as xlogy * added test for fisher matrix * added test for family argument

* put arguments P1, P2 and check_input from fit to __init__ * added check_input test: is P2 positive definite? * added solver option: 'auto'

* added coordinate descent solver * skip doctest for GeneralizedLinearRegressor example * symmetrize P2 => use P2 = 1/2 (P2+P2') * better validation of parameter start_params

* bug for sparse matrices for newton-cg solver, function grad_hess * reduce precision for solver newton-cg in test_poisson_ridge * remedy doctest issues in linear_model.rst for example of GeneralizedLinearRegressor * remove unused import of xrange from six

* bug in cd solver for sparse matrices * higer precision (smaller tol) in test_normal_ridge for sparse matrices * for each solver a separate precision (tol) in test_poisson_ridge

* improved documentation * additional option 'zero' for argument start_params * validation of sample_weight in function predict * input validation of estimate_phi * set default fit_dispersion=None * bug in estimate_phi because of weight rescaling * test for estimate_phi in normal ridge regression * extended tests for elastic net poisson

* new helper function _check_weights for validation of sample_weight * fix white space issue in doctest of linear_model.rst

* fit_dispersion default=None also in docs. * improved docs. * fixed input validation of predict * fixed bug for sample_weight in estimate_phi

* improved docs

* fixed input validation of X in predict

* redundant line of code 'd = np.zeros_like(coef)'

* added test to compare to ElasticNet * deleted identical comment lines

* increased precision in test_normal_enet

* better doc for heavy tailed distributions

* improved input validation and testing of them

* improved input validation and testing of P1 * test case for validation of argument P2 * test case for validation of argument copy_X

* fix doctest failure in example of linear_model.rst * fix dtype issue in test_glm_P2_argument

* fix typos in doc

…spsolve about usage of umfpack

Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>

ogrisel · 2019-10-11T08:19:16Z

The test fail because zero sample_weight for some samples is not equivalent to dropping the samples. I am not sure what would be the correct way to deal with variable exposure in a Lorenz curve.

lorentzenchr · 2020-07-04T10:21:41Z

@ogrisel Could you rebase or merge master, now that #14300 got merged. Or is there another PR?

Christian Lorentzen added 30 commits January 7, 2019 20:00

[WIP] Add Generalized Linear Model, issue scikit-learn#5975, initial …

d5e8810

…commit

[WIP] Add Generalized Linear Models (scikit-learn#9405)

a6137d8

* GeneralizedLinearRegressor added to doc/modules/classes.rst

[WIP] Add Generalized Linear Models (scikit-learn#9405)

0f4bdb3

* added L2 penalty * api change: alpha, l1_ratio, P1, P2, warm_start, check_input, copy_X * added entry in user guide * improved docstrings * helper function _irls_step

[WIP] Add Generalized Linear Models (scikit-learn#9405)

5b46c23

* fix some bugs in user guide linear_model.rst * fix some pep8 issues in test_glm.py

[WIP] Add Generalized Linear Models (scikit-learn#9405)

72485b6

* put arguments P1, P2 and check_input from fit to __init__ * added check_input test: is P2 positive definite? * added solver option: 'auto'

[WIP] Add Generalized Linear Models (scikit-learn#9405)

5c1369b

* added coordinate descent solver * skip doctest for GeneralizedLinearRegressor example * symmetrize P2 => use P2 = 1/2 (P2+P2') * better validation of parameter start_params

[WIP] Add Generalized Linear Models (scikit-learn#9405)

b9e5105

* bug in cd solver for sparse matrices * higer precision (smaller tol) in test_normal_ridge for sparse matrices * for each solver a separate precision (tol) in test_poisson_ridge

[WIP] Add Generalized Linear Models (scikit-learn#9405)

9a98184

* new helper function _check_weights for validation of sample_weight * fix white space issue in doctest of linear_model.rst

[WIP] Add Generalized Linear Models (scikit-learn#9405)

db9defe

* fit_dispersion default=None also in docs. * improved docs. * fixed input validation of predict * fixed bug for sample_weight in estimate_phi

[WIP] Add Generalized Linear Models (scikit-learn#9405)

dc7fdd7

* improved docs

[WIP] Add Generalized Linear Models (scikit-learn#9405)

b11d06b

* fixed input validation of X in predict

[WIP] Add Generalized Linear Models (scikit-learn#9405)

9e6c013

* redundant line of code 'd = np.zeros_like(coef)'

[WIP] Add Generalized Linear Models (scikit-learn#9405)

bad0190

* added test to compare to ElasticNet * deleted identical comment lines

[WIP] Add Generalized Linear Models (scikit-learn#9405)

48137d8

* increased precision in test_normal_enet

[WIP] Add Generalized Linear Models (scikit-learn#9405)

2c2a077

* better doc for heavy tailed distributions

[WIP] Add Generalized Linear Models (scikit-learn#9405)

15931c3

* improved input validation and testing of them

[MRG] Add Generalized Linear Models (scikit-learn#9405)

feedba3

* improved input validation and testing of P1 * test case for validation of argument P2 * test case for validation of argument copy_X

[MRG] Add Generalized Linear Models (scikit-learn#9405)

6fdfb47

* fix doctest failure in example of linear_model.rst * fix dtype issue in test_glm_P2_argument

[MRG] Add Generalized Linear Models (scikit-learn#9405)

d489f56

* fix typos in doc

Remove test_glm_P2_argument

809e3a2

Filter out DeprecationWarning in old versions of scipy.sparse.linalg.…

4edce36

…spsolve about usage of umfpack

import pytest

46df5b6

Document arguments of abstact methods

21f2136

Pytest filter warnings use two colons

1faedf8

Christian Lorentzen and others added 15 commits October 6, 2019 16:20

EXA sharey for histograms

15eb1d3

Plot y_pred histograms on the test set

3d097c6

Merge remote-tracking branch 'origin/master' into GLM-minimal

6372287

Merge remote-tracking branch 'upstream/master' into GLM-minimal

b117856

Compound Poisson => Compound Poisson Gamma

31f5b3d

Compound Poisson => Compound Poisson Gamma

a498ff5

Various improvement in Tweedie regression example

3fae28a

Merge remote-tracking branch 'origin/master' into GLM-minimal

a2b6841

Update doc/modules/linear_model.rst

a47798a

Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>

Use latest docstring conventions everywhere

83391dd

Drop check_input parameter

3bfb54e

Use keyword only arguments SLEP009

d325fe2

Move _y_pred_deviance_derivative from losses as a private function

661cf56

Fix cumulated claim amount curve in Tweedie regression example

560c180

PEP8

0ea2dce

ogrisel mentioned this pull request Oct 10, 2019

Added gini coefficient to ranking and scorer #10084

Closed

ogrisel added 3 commits October 10, 2019 19:29

WIP implementation of Gini coeff and Lorenz curve

a608c70

Use Lorenz curve in Tweedie example

853f8b7

PEP8

b3b55e8

ogrisel force-pushed the glm-gini-coefficient branch from 06283db to b3b55e8 Compare October 10, 2019 17:30

ogrisel added 2 commits October 11, 2019 09:07

Make sure labels/weights are floats before normalizing

640f017

Update scorer test framework

6dd197a

ogrisel mentioned this pull request Oct 11, 2019

Minimal Generalized linear models implementation (L2 + lbfgs) #14300

Merged

7 tasks

rth mentioned this pull request Mar 2, 2020

Visualization and validation tools for regression #16608

Open

github-actions bot added module:linear_model module:metrics labels Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:51

lorentzenchr added the Stalled label Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176

[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176

ogrisel commented Oct 10, 2019 •

edited by lorentzenchr

Loading

ogrisel commented Oct 11, 2019

lorentzenchr commented Jul 4, 2020

[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176

Are you sure you want to change the base?

[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176

Conversation

ogrisel commented Oct 10, 2019 • edited by lorentzenchr Loading

ogrisel commented Oct 11, 2019

lorentzenchr commented Jul 4, 2020

ogrisel commented Oct 10, 2019 •

edited by lorentzenchr

Loading