Skip to content

[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 228 commits into
base: main
Choose a base branch
from

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Oct 10, 2019

Fixes #10003. This PR builds on top of #14300.

I plan to rebase the last 2 commits on top of #14300 from time to time.

TODO:

  • Add tests
  • Add documentation
  • Add a helper plot_lorenz_curve function?

Christian Lorentzen added 30 commits January 7, 2019 20:00
* Fixed pep8
* Fixed flake8
* Rename GeneralizedLinearModel as GeneralizedLinearRegressor
* Use of six.with_metaclass
* PEP257: summary should be on same line as quotes
* Docstring of class GeneralizedLinearRegressor: \ before mu
* Arguments family and link accept strings
* Use of ConvergenceWarning
* GeneralizedLinearRegressor added to doc/modules/classes.rst
* fixed bug: init parameter max_iter
* fix API for family and link:
  default parameter changed to string
  non public variables self._family_instance and self._link_instance
* fixed bug in score, minus sign forgotten
* added check_is_fitted to estimate_phi and score
* added check_array(X) in predict
* replaced lambda functions in TweedieDistribution
* some documentation
* make raw docstrings where appropriate
* make ExponentialDispersionModel (i.e. TweedieDistribution) pickable:
  ExponentialDispersionModel has new properties include_lower_bound,
  method in_y_range is not abstract anymore.
* set self.intercept_=0 if fit_intercept=False, such that it is always defined.
* set score to D2, a generalized R2 with deviance instead of squared error,
  as does glmnet. This also solves issues with
  check_regressors_train(GeneralizedLinearRegressor), which assumes R2 score.
* change of names: weight to weights in ExponentialDispersionModel and to
  sample_weight in GeneralizedLinearRegressor
* add class method linear_predictor
* added L2 penalty
* api change: alpha, l1_ratio, P1, P2, warm_start, check_input, copy_X
* added entry in user guide
* improved docstrings
* helper function _irls_step
* fix some bugs in user guide linear_model.rst
* fix some pep8 issues in test_glm.py
* added test: ridge poisson with log-link compared to glmnet
* fix ValueError message for l1_ratio
* fix ValueError message for P2
* string comparison: use '==' and '!=' instead of 'is' and 'is not'
* fix RuntimeWarnings in unit_deviance of poisson: x*log(x) as xlogy
* added test for fisher matrix
* added test for family argument
* put arguments P1, P2 and check_input from fit to __init__
* added check_input test: is P2 positive definite?
* added solver option: 'auto'
* added coordinate descent solver
* skip doctest for GeneralizedLinearRegressor example
* symmetrize P2 => use P2 = 1/2 (P2+P2')
* better validation of parameter start_params
* bug for sparse matrices for newton-cg solver, function grad_hess
* reduce precision for solver newton-cg in test_poisson_ridge
* remedy doctest issues in linear_model.rst for example of GeneralizedLinearRegressor
* remove unused import of xrange from six
* bug in cd solver for sparse matrices
* higer precision (smaller tol) in test_normal_ridge for sparse matrices
* for each solver a separate precision (tol) in test_poisson_ridge
* improved documentation
* additional option 'zero' for argument start_params
* validation of sample_weight in function predict
* input validation of estimate_phi
* set default fit_dispersion=None
* bug in estimate_phi because of weight rescaling
* test for estimate_phi in normal ridge regression
* extended tests for elastic net poisson
* new helper function _check_weights for validation of sample_weight
* fix white space issue in doctest of linear_model.rst
* fit_dispersion default=None also in docs.
* improved docs.
* fixed input validation of predict
* fixed bug for sample_weight in estimate_phi
* fixed input validation of X in predict
* redundant line of code 'd = np.zeros_like(coef)'
* added test to compare to ElasticNet
* deleted identical comment lines
* increased precision in test_normal_enet
* better doc for heavy tailed distributions
* improved input validation and testing of them
* improved input validation and testing of P1
* test case for validation of argument P2
* test case for validation of argument copy_X
* fix doctest failure in example of linear_model.rst

* fix dtype issue in test_glm_P2_argument
@ogrisel ogrisel force-pushed the glm-gini-coefficient branch from 06283db to b3b55e8 Compare October 10, 2019 17:30
@ogrisel
Copy link
Member Author

ogrisel commented Oct 11, 2019

The test fail because zero sample_weight for some samples is not equivalent to dropping the samples. I am not sure what would be the correct way to deal with variable exposure in a Lorenz curve.

@lorentzenchr
Copy link
Member

@ogrisel Could you rebase or merge master, now that #14300 got merged. Or is there another PR?

Base automatically changed from master to main January 22, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sklearn.metrics.cumulative_gain_curve and sklearn.metrics.lift_curve
4 participants