Skip to content

Add metrics.gini_index_score() #28535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yonil7 opened this issue Feb 26, 2024 · 5 comments · May be fixed by #21320
Open

Add metrics.gini_index_score() #28535

yonil7 opened this issue Feb 26, 2024 · 5 comments · May be fixed by #21320

Comments

@yonil7
Copy link

yonil7 commented Feb 26, 2024

Describe the workflow you want to enable

The Gini index metric (that is based on Lorenz curve) is widely used in the insurance industry for evaluating the performance (ranking power) of various risk models.

Describe your proposed solution

2 sklearn examples already includes code for calculating the gini index metric here and here

Related to #28534

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@yonil7 yonil7 added Needs Triage Issue requires triage New Feature labels Feb 26, 2024
@fkdosilovic
Copy link
Contributor

Recently, there was a discussion about this in #28144 (PR #28156).

Quote from one of the core developers (#28144 (comment)):

You will find this simple formula implemented in some of our examples, too. Because it is so simple, I‘m against adding it to the already large metrics API.

@yonil7
Copy link
Author

yonil7 commented Feb 28, 2024

I'm not sure, but there seems to be 2 different gini index/coefficient definitions:

  1. Defined on the output of a binary classifier:
    Defines here: https://en.wikipedia.org/wiki/Gini_coefficient#Relation_to_other_statistical_measures and in roc_auc_score() notes (gini = 2 * roc_auc - 1)

  2. Defined on a more general float variable. (not necessarily the output of a binary classifier):
    Defines here: https://en.wikipedia.org/wiki/Gini_coefficient and in this code example:

    ordered_samples, cum_claims = lorenz_curve(y_true, y_pred, exposure)
    gini = 1 - 2 * auc(ordered_samples, cum_claims)

    Note that this code does not use roc_auc_score(), it uses lorenz_curve() and then calculate the area under the Lorenz curve (not the area under the ROC curve) using auc()

@fkdosilovic
Copy link
Contributor

You are right, but I'm assuming that a similar argument (too simple to include in the library) could be made for Gini coefficient you are requesting (sklearn/core-devs will know better).

@lorentzenchr
Copy link
Member

See #21320 (comment) .

@lorentzenchr lorentzenchr linked a pull request Mar 8, 2024 that will close this issue
@glemaitre glemaitre removed the Needs Triage Issue requires triage label Mar 11, 2024
@lorentzenchr
Copy link
Member

Almost a duplicate: #10003.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants