-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] FEA Lift metric and curve #21320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Any recommendations/comments. The lift metric seems to be a desired feature in previous PRs. |
Thanks @nawarhalabi this looks quite nice. However, given that the major bottleneck in scikit-learn is in review time, I'd suggest you break it into smaller contributions, starting with the |
Thanks @jnothman
|
I think that will be helpful for reviewers, yes, although I am not as
available to review personally as once upon a time so I can't be too sure!
|
Just to clarify expectations: While splitting this PR will help in the review process, it is not yet decided to include the proposed functionality. In #21718, we are trying to figure out some principles for model evaluation tools. |
@lorentzenchr do you think these days we'd be including this in the codebase? Or is this something we'd be happier to have in skrub @GaelVaroquaux @ogrisel or maybe scikit-lego? (@koaning) |
I'd be open to adding it to scikit-lego. But at first glance it does feel general enough that it could also live here. If the conclusion is that it's not a great fit for sklearn or skrub then I'll gladly consider it for sklego. |
I think an "ML Gini index" as well as an accompanying graph "Cumulative Accuracy Profile" (CAP)1 for non-negative regression (of which binary classification is an example) would complete our tools for measuring ranking (discriminative) power of models and generalize the existing AUC and ROC. Note that there is a great confusion about terms. In Gini Index and Friends, my coauthors and me summarized existing literature and gave tutorial like examples. This is now my main reference if I look up terms like "Gini index" (which one?!?!?). 1 aka gain curve and (cumulative) lift curve, it is almost the inverted Lorenz curve of the empirical distribution generated by the model predictions. |
@nawarhalabi would you be able to give this PR an update? |
For sure we should prioritize this action when it comes to the inspection. |
What do you mean by action? |
Wrong word, my brain did not work properly. By "action" I meant that we should have a display for this type of curve. |
Reference Issues/PRs
implements partially what is in two stale PRs in a significantly more complete manner:
What does this implement/fix? Explain your changes.
Implemented the lift_score metric function, with the lift_curve function and the LiftCurveDisplay class.
Lift is a commonly used metric in evaluating response to ad campaigns. please read:
This implementation includes:
sklearn/metrics/_classification.py
for calculating a single valuesklearn/metrics/_ranking.py
for calculating an array of lifts based on different positive classification ratessklearn/metrics/_plot/lift_curve.py
for plotting the lift curve/chart