-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC improve the cost-sensitive learning example #29149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC improve the cost-sensitive learning example #29149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I think that @lorentzenchr will be happy to remove the balanced-accuracy mention :)
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@lorentzenchr thanks for the review, I think I addressed all your feedback. |
fraudulent_refuse = (mask_true_positive.sum() * 50) + amount[ | ||
mask_true_positive | ||
].sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the amount here is removed, when I gave a talk on this, this was a nice point to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you don't gain the amount when rejecting a fraudulent case :). Indeed, you are just not loosing it.
The amount itself only contribute when accepting the claim by taking a proportional amount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we have the false negative counterpart. Removing this puts a massive pressure on making sure we have no false negatives, and will somewhat be okay with true positives being low, since the amounts are usually a lot larger than 50€.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated PR puts our example cost matrix better in line with the one proposed in Elkan 2001 paper on cost-sensitive learning (the fixed costs and gains are not the same but the variable components are compatible). Here is an excerpt of the relevant paragraph (part of section 1.2):
I don't understand why we would put a gain that is proportional to amount in the fraudulent_refuse
case. If you catch a frauder, nobody will pay the bank the amount of the transaction the frauder would have otherwise stolen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Would be nice to have this chart in the example actually, makes things quite clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep that in mind for a future iteration on this example.
Here is a summary of the proposed changes:
prefit=True
option ofFixedThresholdClassifier
./cc @glemaitre @lorentzenchr.