Various improvements and more details in release highlights for 1.5 #29056

ogrisel · 2024-05-20T15:26:16Z

/cc @jeremiedbb.

github-actions · 2024-05-20T15:27:29Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5aad874. Link to the linter CI: here}

examples/release_highlights/plot_release_highlights_1_5_0.py

jeremiedbb · 2024-05-20T15:40:37Z

examples/release_highlights/plot_release_highlights_1_5_0.py

+# Note however, that the balanced accuracy is not necessarily the most
+# meaningful model selection metric for a given application. It often makes
+# sense to optimize the decision threshold directly for a business metric of
+# interest. **Custom business metrics can be defined by assigning different costs
+# to false positives and false negatives or different gains to true positives
+# and true negatives.** Furthermore, those costs and gains can depend on ancilary
+# metadata specific to each individual data point such as the amount of a
+# transaction in a fraud detection system.


Not sure this belongs to the highlights. This is already explained in the examples. And I mean all examples we show in the highlights are always not meaningful toy examples just for the purpose on simple demonstration.

I think optimizing for custom is the single most important use of this new tool. However the text and the code snippet do not make this point strong enough. I think we need to explain a minimum what a custom metric is and why it can often rely on side metadata to give motivation to the reader to follow the link and read until the section that explains cost-sensitive learning with side metadata.

I agree that this will be too complex to put in the code snippet of the release highlight but I thought this paragraph could help as a middle ground.

are always not meaningful toy examples just for the purpose on simple demonstration.

Doing things sub-optimally in the past should not be a reason to no improve for the future ;)

If the demonstration is not realistic enough, it's hard to assess the value of the new feature and justify spending time and effort to learn more in the doc / examples.

I think we need to explain a minimum what a custom metric is and why it can often rely on side metadata to give motivation to the reader to follow the link and read until the section that explains cost-sensitive learning with side metadata.

Alright

Doing things sub-optimally in the past should not be a reason to no improve for the future

I'm not saying it's suboptimal, to me it's good to make extra simple snippets :)

If the demonstration is not realistic enough, it's hard to assess the value of the new feature and justify spending time and effort to learn more in the doc / examples.

I'm not sure about that. I have the opposite impression. I'd be interested in seing statistics. To me, it drives more attention if it just shows that you can now do something that you couldn't do before in a very simple way. Then if someone wants to see how far he can go, he will dig the docs deeper.

Could we please use a meaningful custom metric then? And avoid all the balanced accuracy stuff?

Or simply use accuracy. Here for the threshold, it‘s fine as long as it’s not too extreme.

I'm okay with that

roc_auc_score should be invariant to a change in threshold.

Yes but I do not see how it's related to this discussion. Using a threshold-invariant metric is pretty useless to tune the decision threshold. Arguably we should even raise a warning in that in that case.

I would not mind to use a very short custom metric like.

Alright. I will update this PR accordingly then.

Yes but I do not see how it's related to this discussion. Using a threshold-invariant metric is pretty useless to tune the decision threshold. Arguably we should even raise a warning in that in that case.

you can ignore, I made a suggestion but quickly realised it was non-sense so I edited my comment 😄

I pushed 09bd60f. Hopefully it will help getting our message through while staying concise enough.

I changed the custom function to actually introduce a tradeoff between the two classes otherwise one would get a dummy classifier without realizing it.

examples/release_highlights/plot_release_highlights_1_5_0.py

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

jeremiedbb

LGTM, regardless of the resolution of #29056 (comment) and #29056 (comment)

ogrisel · 2024-05-21T13:47:12Z

Thanks for the reviews. I enabled auto-merge.

ogrisel · 2024-05-21T13:47:56Z

@jeremiedbb I let you handle the backport to 1.5.X? Otherwise I can do the backport.

examples/release_highlights/plot_release_highlights_1_5_0.py

ogrisel · 2024-05-21T13:49:42Z

Actually there is rendering problem with doubled confusing matrices. I did not observe this behavior in my VS Code environment...

EDIT: actually I get the same problem in VS Code. So it's not sphinx specific. Maybe the .plot() are redundant.

glemaitre · 2024-05-21T13:54:52Z

I'll have a look at the confusion matrix issue.

ogrisel · 2024-05-21T13:58:56Z

I should be good with @jeremiedbb's fix.

….5 (scikit-learn#29056) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

lorentzenchr · 2024-05-21T15:41:27Z

@ogrisel Thanks Olivier for making it better.

….5 (#29056) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

ogrisel added this to the 1.5 milestone May 20, 2024

jeremiedbb reviewed May 20, 2024

View reviewed changes

jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label May 20, 2024

ogrisel commented May 20, 2024

View reviewed changes

examples/release_highlights/plot_release_highlights_1_5_0.py Outdated Show resolved Hide resolved

jeremiedbb reviewed May 20, 2024

View reviewed changes

examples/release_highlights/plot_release_highlights_1_5_0.py Outdated Show resolved Hide resolved

Various improvements and more details in release highlights for 1.5

db03bb8

ogrisel force-pushed the improve-scikit-learn-1.5-release-highlights branch from feac314 to db03bb8 Compare May 20, 2024 16:19

ogrisel and others added 2 commits May 20, 2024 18:22

Ancillary => auxiliary

90f1209

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

up to an order of magnitude faster

0e8bd23

jeremiedbb approved these changes May 20, 2024

View reviewed changes

Revert to original wording.

b1424b6

ogrisel mentioned this pull request May 21, 2024

DRAFT expand experiments to help decide on what do include in the highlights ogrisel/scikit-learn#17

Closed

Use a custom business metric instead of balanced accuracy

09bd60f

lorentzenchr approved these changes May 21, 2024

View reviewed changes

ogrisel enabled auto-merge (squash) May 21, 2024 13:46

jeremiedbb disabled auto-merge May 21, 2024 13:48

jeremiedbb reviewed May 21, 2024

View reviewed changes

examples/release_highlights/plot_release_highlights_1_5_0.py Outdated Show resolved Hide resolved

examples/release_highlights/plot_release_highlights_1_5_0.py Outdated Show resolved Hide resolved

jeremiedbb added 2 commits May 21, 2024 15:54

Update examples/release_highlights/plot_release_highlights_1_5_0.py

a8bfc1c

Update examples/release_highlights/plot_release_highlights_1_5_0.py

5aad874

glemaitre self-requested a review May 21, 2024 13:54

jeremiedbb approved these changes May 21, 2024

View reviewed changes

jeremiedbb enabled auto-merge (squash) May 21, 2024 14:16

jeremiedbb merged commit 071293d into scikit-learn:main May 21, 2024
28 checks passed

jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request May 21, 2024

DOC Various improvements and more details in release highlights for 1…

d8e56cc

….5 (scikit-learn#29056) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

ogrisel deleted the improve-scikit-learn-1.5-release-highlights branch May 21, 2024 15:22

jeremiedbb added a commit that referenced this pull request May 21, 2024

DOC Various improvements and more details in release highlights for 1…

5491dc6

….5 (#29056) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

ogrisel mentioned this pull request May 24, 2024

FIX TunedThresholdClassifierCV error or warn with informative message on invalid metrics #29082

Open

Uh oh!

Various improvements and more details in release highlights for 1.5 #29056

Various improvements and more details in release highlights for 1.5 #29056

Uh oh!

Conversation

ogrisel commented May 20, 2024

Uh oh!

github-actions bot commented May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented May 21, 2024

Uh oh!

ogrisel commented May 21, 2024

Uh oh!

Uh oh!

Uh oh!

ogrisel commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented May 21, 2024

Uh oh!

ogrisel commented May 21, 2024

Uh oh!

Uh oh!

lorentzenchr commented May 21, 2024

Uh oh!

Uh oh!

github-actions bot commented May 20, 2024 •

edited

Loading

ogrisel May 21, 2024 •

edited

Loading

ogrisel commented May 21, 2024 •

edited

Loading