Skip to content

Various improvements and more details in release highlights for 1.5 #29056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented May 20, 2024

@ogrisel ogrisel added this to the 1.5 milestone May 20, 2024
Copy link

github-actions bot commented May 20, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 5aad874. Link to the linter CI: here

Comment on lines 79 to 86
# Note however, that the balanced accuracy is not necessarily the most
# meaningful model selection metric for a given application. It often makes
# sense to optimize the decision threshold directly for a business metric of
# interest. **Custom business metrics can be defined by assigning different costs
# to false positives and false negatives or different gains to true positives
# and true negatives.** Furthermore, those costs and gains can depend on ancilary
# metadata specific to each individual data point such as the amount of a
# transaction in a fraud detection system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this belongs to the highlights. This is already explained in the examples. And I mean all examples we show in the highlights are always not meaningful toy examples just for the purpose on simple demonstration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think optimizing for custom is the single most important use of this new tool. However the text and the code snippet do not make this point strong enough. I think we need to explain a minimum what a custom metric is and why it can often rely on side metadata to give motivation to the reader to follow the link and read until the section that explains cost-sensitive learning with side metadata.

I agree that this will be too complex to put in the code snippet of the release highlight but I thought this paragraph could help as a middle ground.

are always not meaningful toy examples just for the purpose on simple demonstration.

Doing things sub-optimally in the past should not be a reason to no improve for the future ;)

If the demonstration is not realistic enough, it's hard to assess the value of the new feature and justify spending time and effort to learn more in the doc / examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to explain a minimum what a custom metric is and why it can often rely on side metadata to give motivation to the reader to follow the link and read until the section that explains cost-sensitive learning with side metadata.

Alright

Doing things sub-optimally in the past should not be a reason to no improve for the future

I'm not saying it's suboptimal, to me it's good to make extra simple snippets :)

If the demonstration is not realistic enough, it's hard to assess the value of the new feature and justify spending time and effort to learn more in the doc / examples.

I'm not sure about that. I have the opposite impression. I'd be interested in seing statistics. To me, it drives more attention if it just shows that you can now do something that you couldn't do before in a very simple way. Then if someone wants to see how far he can go, he will dig the docs deeper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please use a meaningful custom metric then? And avoid all the balanced accuracy stuff?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply use accuracy. Here for the threshold, it‘s fine as long as it’s not too extreme.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with that

Copy link
Member Author

@ogrisel ogrisel May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roc_auc_score should be invariant to a change in threshold.

Yes but I do not see how it's related to this discussion. Using a threshold-invariant metric is pretty useless to tune the decision threshold. Arguably we should even raise a warning in that in that case.

I would not mind to use a very short custom metric like.

Alright. I will update this PR accordingly then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I do not see how it's related to this discussion. Using a threshold-invariant metric is pretty useless to tune the decision threshold. Arguably we should even raise a warning in that in that case.

you can ignore, I made a suggestion but quickly realised it was non-sense so I edited my comment 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 09bd60f. Hopefully it will help getting our message through while staying concise enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the custom function to actually introduce a tradeoff between the two classes otherwise one would get a dummy classifier without realizing it.

@jeremiedbb jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label May 20, 2024
@ogrisel ogrisel force-pushed the improve-scikit-learn-1.5-release-highlights branch from feac314 to db03bb8 Compare May 20, 2024 16:19
ogrisel and others added 2 commits May 20, 2024 18:22
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, regardless of the resolution of #29056 (comment) and #29056 (comment)

@ogrisel ogrisel enabled auto-merge (squash) May 21, 2024 13:46
@ogrisel
Copy link
Member Author

ogrisel commented May 21, 2024

Thanks for the reviews. I enabled auto-merge.

@ogrisel
Copy link
Member Author

ogrisel commented May 21, 2024

@jeremiedbb I let you handle the backport to 1.5.X? Otherwise I can do the backport.

@jeremiedbb jeremiedbb disabled auto-merge May 21, 2024 13:48
@ogrisel
Copy link
Member Author

ogrisel commented May 21, 2024

Actually there is rendering problem with doubled confusing matrices. I did not observe this behavior in my VS Code environment...

image

EDIT: actually I get the same problem in VS Code. So it's not sphinx specific. Maybe the .plot() are redundant.

@glemaitre
Copy link
Member

I'll have a look at the confusion matrix issue.

@glemaitre glemaitre self-requested a review May 21, 2024 13:54
@ogrisel
Copy link
Member Author

ogrisel commented May 21, 2024

I should be good with @jeremiedbb's fix.

@jeremiedbb jeremiedbb enabled auto-merge (squash) May 21, 2024 14:16
@jeremiedbb jeremiedbb merged commit 071293d into scikit-learn:main May 21, 2024
28 checks passed
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request May 21, 2024
….5 (scikit-learn#29056)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
@ogrisel ogrisel deleted the improve-scikit-learn-1.5-release-highlights branch May 21, 2024 15:22
@lorentzenchr
Copy link
Member

@ogrisel Thanks Olivier for making it better.

jeremiedbb added a commit that referenced this pull request May 21, 2024
….5 (#29056)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants