Added gini coefficient to ranking and scorer #10084

tagomatech · 2017-11-07T20:21:41Z

Added a function at the end of sklearn\metrics\ranking.py to compute the Gini coefficient which is being used in some Kaggle competitions.

I added the corresponding import declaration in sklearn\metrics\__init__.py

Finally, I create a scorer à la sklearn in sklearn\metrics\sorer.py, so that the gini coefficient can be used across sklearn validation/metrics functions, e.g. cross_val_score .

Reference was taken here and results were checked against several entries on Kaggle and sklearn AUC/ROC score (is it not rocket_science, to be honest).

jnothman

Please add this to metrics/tests/test_common.py and also add specific tests that this matches known scores on toy datasets.

jnothman · 2017-11-07T22:13:24Z

sklearn/metrics/ranking.py

@@ -858,3 +858,33 @@ def ndcg_score(y_true, y_score, k=5):
        scores.append(actual / best)

    return np.mean(scores)
+
+
+def gini(y_true, y_score):


Perhaps name this gini_score for consistency

jnothman · 2017-11-07T22:15:30Z

sklearn/metrics/ranking.py

+    ----------
+    .. [1] David J. Hand  and Robert J. Till (2001).
+            A Simple Generalisation of the Area Under the ROC Curve for
+            Multiple Class Classification Problems. In Machine Learning, 45,


Your implementation does not currently extend to multiclass. You have merely implemented a chance corrected binary roc

qinhanmin2014 · 2017-11-08T01:32:31Z

@tagomatech Could you please explain why do we need gini coefficient since we already have roc_auc_score? It can almost be replaced by roc_auc_score and it seems hard to find any reference about its definition and application in ML. I don't think the paper your provide is a good reference. It only states that gini index(gini coefficient?) is equivalent to roc_auc_score and the whole paper is based on roc_auc_score.
(Forgive me if there's something wrong :) )

tagomatech · 2017-11-08T12:00:27Z

@qinhanmin2014
Adding this function is a small improvement, indeed. Personally, I find it useful when playing around Kaggle competitions.
As per the sources, there is a lot of confusion about "Gini index", "Gini coefficient", "Normalized Coefficient". The source I suggested possesses the virtue of being unambiguous, by defining Gini in relation to AUC.

qinhanmin2014 · 2017-11-08T12:18:41Z

@tagomatech Thanks.
I think we have reached consensus that:
(1)The metrics can almost be replaced by roc_auc_score.
(2)It is difficult to find reference about its definition and application in ML. Right?
So I might be -1 for the metric.
Also, from my perspective, kaggle can be the application of our metrics, but might be difficult to serve as the (main) origin of our metrics, because in some cases, their metrics are designed for special scenario.
This is only my personal opinion so feel free to fix the conflict, make CIs green, provide more persuasive literature and wait for the opinion from core devs.

glemaitre · 2017-11-22T10:56:37Z

I am -1 to merge since the score can be easily computed from the ROC AUC.
I would also think that it could be some confusion between the Gini impurity used the decision tree and the Gini coefficient.

qinhanmin2014 · 2017-11-22T11:11:41Z

@tagomatech Thanks a lot for your contribution. Sorry but I'm going to close this one with the another -1 above. I think the general consensus is that it can be replaced by roc_auc_score and there's no clear definition.

ogrisel · 2019-10-10T17:14:03Z

Actually the Gini coefficient is defined in terms of area under the Lorenz curve (for positive regression models) which is not the same as ROC AUC. I started an undocumented prototype implementation in #15176.

tagomatech added 4 commits November 7, 2017 18:52

add function to compute the Gini coefficient

b7edf5d

added gini import

9fd878d

added scorer for gini coefficient

8021de4

updated scorer.py

a9cb5c6

jnothman reviewed Nov 7, 2017

View reviewed changes

qinhanmin2014 closed this Nov 22, 2017

rth mentioned this pull request Mar 2, 2020

Visualization and validation tools for regression #16608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added gini coefficient to ranking and scorer #10084

Added gini coefficient to ranking and scorer #10084

tagomatech commented Nov 7, 2017

jnothman left a comment

jnothman Nov 7, 2017

jnothman Nov 7, 2017

qinhanmin2014 commented Nov 8, 2017

tagomatech commented Nov 8, 2017

qinhanmin2014 commented Nov 8, 2017

glemaitre commented Nov 22, 2017

qinhanmin2014 commented Nov 22, 2017

ogrisel commented Oct 10, 2019

Added gini coefficient to ranking and scorer #10084

Added gini coefficient to ranking and scorer #10084

Conversation

tagomatech commented Nov 7, 2017

jnothman left a comment

Choose a reason for hiding this comment

jnothman Nov 7, 2017

Choose a reason for hiding this comment

jnothman Nov 7, 2017

Choose a reason for hiding this comment

qinhanmin2014 commented Nov 8, 2017

tagomatech commented Nov 8, 2017

qinhanmin2014 commented Nov 8, 2017

glemaitre commented Nov 22, 2017

qinhanmin2014 commented Nov 22, 2017

ogrisel commented Oct 10, 2019