-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Implement Gini coefficient for model selection with positive regression GLMs #15176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ogrisel
wants to merge
228
commits into
scikit-learn:main
Choose a base branch
from
ogrisel:glm-gini-coefficient
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Fixed pep8 * Fixed flake8 * Rename GeneralizedLinearModel as GeneralizedLinearRegressor * Use of six.with_metaclass * PEP257: summary should be on same line as quotes * Docstring of class GeneralizedLinearRegressor: \ before mu * Arguments family and link accept strings * Use of ConvergenceWarning
* GeneralizedLinearRegressor added to doc/modules/classes.rst
* fixed bug: init parameter max_iter * fix API for family and link: default parameter changed to string non public variables self._family_instance and self._link_instance * fixed bug in score, minus sign forgotten * added check_is_fitted to estimate_phi and score * added check_array(X) in predict * replaced lambda functions in TweedieDistribution * some documentation
* make raw docstrings where appropriate * make ExponentialDispersionModel (i.e. TweedieDistribution) pickable: ExponentialDispersionModel has new properties include_lower_bound, method in_y_range is not abstract anymore. * set self.intercept_=0 if fit_intercept=False, such that it is always defined. * set score to D2, a generalized R2 with deviance instead of squared error, as does glmnet. This also solves issues with check_regressors_train(GeneralizedLinearRegressor), which assumes R2 score. * change of names: weight to weights in ExponentialDispersionModel and to sample_weight in GeneralizedLinearRegressor * add class method linear_predictor
* added L2 penalty * api change: alpha, l1_ratio, P1, P2, warm_start, check_input, copy_X * added entry in user guide * improved docstrings * helper function _irls_step
* fix some bugs in user guide linear_model.rst * fix some pep8 issues in test_glm.py
* added test: ridge poisson with log-link compared to glmnet * fix ValueError message for l1_ratio * fix ValueError message for P2 * string comparison: use '==' and '!=' instead of 'is' and 'is not' * fix RuntimeWarnings in unit_deviance of poisson: x*log(x) as xlogy * added test for fisher matrix * added test for family argument
* put arguments P1, P2 and check_input from fit to __init__ * added check_input test: is P2 positive definite? * added solver option: 'auto'
* added coordinate descent solver * skip doctest for GeneralizedLinearRegressor example * symmetrize P2 => use P2 = 1/2 (P2+P2') * better validation of parameter start_params
* bug for sparse matrices for newton-cg solver, function grad_hess * reduce precision for solver newton-cg in test_poisson_ridge * remedy doctest issues in linear_model.rst for example of GeneralizedLinearRegressor * remove unused import of xrange from six
* bug in cd solver for sparse matrices * higer precision (smaller tol) in test_normal_ridge for sparse matrices * for each solver a separate precision (tol) in test_poisson_ridge
* improved documentation * additional option 'zero' for argument start_params * validation of sample_weight in function predict * input validation of estimate_phi * set default fit_dispersion=None * bug in estimate_phi because of weight rescaling * test for estimate_phi in normal ridge regression * extended tests for elastic net poisson
* new helper function _check_weights for validation of sample_weight * fix white space issue in doctest of linear_model.rst
* fit_dispersion default=None also in docs. * improved docs. * fixed input validation of predict * fixed bug for sample_weight in estimate_phi
* improved docs
* fixed input validation of X in predict
* redundant line of code 'd = np.zeros_like(coef)'
* added test to compare to ElasticNet * deleted identical comment lines
* increased precision in test_normal_enet
* better doc for heavy tailed distributions
* improved input validation and testing of them
* improved input validation and testing of P1 * test case for validation of argument P2 * test case for validation of argument copy_X
* fix doctest failure in example of linear_model.rst * fix dtype issue in test_glm_P2_argument
* fix typos in doc
…spsolve about usage of umfpack
Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>
06283db
to
b3b55e8
Compare
The test fail because zero sample_weight for some samples is not equivalent to dropping the samples. I am not sure what would be the correct way to deal with variable exposure in a Lorenz curve. |
7 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #10003. This PR builds on top of #14300.
I plan to rebase the last 2 commits on top of #14300 from time to time.
TODO: