Skip to content

DOC improve the cost-sensitive learning example #29149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented May 31, 2024

Here is a summary of the proposed changes:

  • better names for metrics to avoid confusion between gains and costs;
  • fixed the definition of the variable cost metric for the fraud transaction case to better align with the example in Elkan's paper;
  • removed any reference to balanced accuracy as the example is already long and I found it was mostly a distraction away from the important message on the business metrics;
  • use the new prefit=True option of FixedThresholdClassifier.

/cc @glemaitre @lorentzenchr.

Copy link

github-actions bot commented May 31, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 5382a42. Link to the linter CI: here

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I think that @lorentzenchr will be happy to remove the balanced-accuracy mention :)

ogrisel and others added 2 commits May 31, 2024 17:02
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@ogrisel
Copy link
Member Author

ogrisel commented May 31, 2024

@lorentzenchr thanks for the review, I think I addressed all your feedback.

@lorentzenchr lorentzenchr merged commit 5d92c35 into scikit-learn:main Jun 3, 2024
32 checks passed
@ogrisel ogrisel deleted the improve-cost-sensitive-learning-example branch June 3, 2024 07:38
Comment on lines -514 to -516
fraudulent_refuse = (mask_true_positive.sum() * 50) + amount[
mask_true_positive
].sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the amount here is removed, when I gave a talk on this, this was a nice point to make.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you don't gain the amount when rejecting a fraudulent case :). Indeed, you are just not loosing it.

The amount itself only contribute when accepting the claim by taking a proportional amount.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we have the false negative counterpart. Removing this puts a massive pressure on making sure we have no false negatives, and will somewhat be okay with true positives being low, since the amounts are usually a lot larger than 50€.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated PR puts our example cost matrix better in line with the one proposed in Elkan 2001 paper on cost-sensitive learning (the fixed costs and gains are not the same but the variable components are compatible). Here is an excerpt of the relevant paragraph (part of section 1.2):

image

I don't understand why we would put a gain that is proportional to amount in the fraudulent_refuse case. If you catch a frauder, nobody will pay the bank the amount of the transaction the frauder would have otherwise stolen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Would be nice to have this chart in the example actually, makes things quite clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep that in mind for a future iteration on this example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants