-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Added gini coefficient to ranking and scorer #10084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this to metrics/tests/test_common.py and also add specific tests that this matches known scores on toy datasets.
@@ -858,3 +858,33 @@ def ndcg_score(y_true, y_score, k=5): | |||
scores.append(actual / best) | |||
|
|||
return np.mean(scores) | |||
|
|||
|
|||
def gini(y_true, y_score): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps name this gini_score for consistency
---------- | ||
.. [1] David J. Hand and Robert J. Till (2001). | ||
A Simple Generalisation of the Area Under the ROC Curve for | ||
Multiple Class Classification Problems. In Machine Learning, 45, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your implementation does not currently extend to multiclass. You have merely implemented a chance corrected binary roc
@tagomatech Could you please explain why do we need gini coefficient since we already have roc_auc_score? It can almost be replaced by roc_auc_score and it seems hard to find any reference about its definition and application in ML. I don't think the paper your provide is a good reference. It only states that gini index(gini coefficient?) is equivalent to roc_auc_score and the whole paper is based on roc_auc_score. |
@qinhanmin2014 |
@tagomatech Thanks. |
I am -1 to merge since the score can be easily computed from the ROC AUC. |
@tagomatech Thanks a lot for your contribution. Sorry but I'm going to close this one with the another -1 above. I think the general consensus is that it can be replaced by roc_auc_score and there's no clear definition. |
Actually the Gini coefficient is defined in terms of area under the Lorenz curve (for positive regression models) which is not the same as ROC AUC. I started an undocumented prototype implementation in #15176. |
Added a function at the end of
sklearn\metrics\ranking.py
to compute the Gini coefficient which is being used in some Kaggle competitions.I added the corresponding import declaration in
sklearn\metrics\__init__.py
Finally, I create a
scorer
à lasklearn
in sklearn\metrics\sorer.py, so that the gini coefficient can be used acrosssklearn
validation/metrics functions, e.g.cross_val_score
.Reference was taken here and results were checked against several entries on Kaggle and sklearn AUC/ROC score (is it not rocket_science, to be honest).