DOC Update plots in Categorical Feature Support in GBDT example #31062

ArturoAmorQ · 2025-03-24T15:28:40Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Inspired by this example's way of showing the test_score/fit_time trade-off, I find a scatter plot easier to read and interpret than bar plots.

This PR also introduces a log scale for fit times.

Current plot:

This PR (most recent render):

sphx_glr_plot_gradient_boosting_categorical_001

Any other comments?

I took the opportunity to show the intermediate html diagrams and a introduce a wording tweak.

github-actions · 2025-03-24T15:30:03Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 96b7c7c. Link to the linter CI: here}

ogrisel · 2025-04-30T08:23:58Z

As discussed during the bi-weekly meeting, it would be great to make it explicit that the best models are in the bottom left corner, maybe using a matplotlib arrow annotation in the bottom left corner with the text "best models" pointing towards the point at coordinate (0, 0) (or (0.1, 0.1) because of the log scale on the x axis).

Also, could you please add more ticks on the x axis, e.g.: 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4.

Also, please add the TargetEncoder to this plot.

ArturoAmorQ · 2025-05-02T13:56:36Z

Current result:

Also, please add the TargetEncoder to this plot.

As that also requires adding narrative on the interpretation, I would rather leave that to a follow-up PR.

ogrisel

LGTM, thanks!

lucyleeow

Sorry this took me so long to get to!
This is a much nicer graph, thank you!

I couldn't comment on the line but maybe we could add the name of the category handling strategy in the bullet list at the top, i.e.:

- "Dropped": dropping the categorical features
- "One Hot": using a :class:`~preprocessing.OneHotEncoder`
- "Ordinal": using an :class:`~preprocessing.OrdinalEncoder` and treat categories as
  ordered, equidistant quantities
- "Native": using an :class:`~preprocessing.OrdinalEncoder` and rely on the :ref:`native
  category support <categorical_support_gbdt>` of the
  :class:`~ensemble.HistGradientBoostingRegressor` estimator.

Do you think it's worth mentioning somewhere on the graph that the error bars are 1 standard deviation or is it obvious what the error bars are?

I saw the comment on adding TargetEncoder but maybe that could be left to another PR...?

It's nice that we make it clear to the user where the best models are (with the arrow and 'best models' text), but it does confuse me at first:

I expect it to be pointing to something, e.g., a scatter point. We could add a circle that it points to, but this is not necessarily clear either
The graph axes does not start at 0,0 so it's possibly mis-leading?

I can't think how to improve this. Maybe it is clear enough if people read the label titles, that smaller error/faster is better? Maybe we could add to the x and y axes labels, e.g., an arrow; "<- faster fitting" and "↓ better model/lower error"). Or just explaining it in the text may be enough?

Edit: Maybe we could just have the text 'Best models' in the bottom left corner and no arrow ?

examples/ensemble/plot_gradient_boosting_categorical.py

lucyleeow · 2025-07-11T02:22:26Z

examples/ensemble/plot_gradient_boosting_categorical.py

+
+
+class CustomLogFormatter(ticker.Formatter):
+    def __call__(self, x, pos=None):


Maybe a docstring here?

Also I'm surprised it's so hard to format superscript in mpl?!

Maybe scientific notation is good enough? e.g., "%1.1e" ?

You are right, that simplifies code a lot.

lucyleeow · 2025-07-11T02:35:22Z

examples/ensemble/plot_gradient_boosting_categorical.py

+
+def plot_performance_tradeoff(results, title):
+    fig, ax = plt.subplots()
+    markers = ["s", "o", "^", "x"]


Nitpick, the last graph, it is a bit tricky to quickly tell the native and ordinal markers/error bars
Would it be 'easy' to change the colour of the the marker, to be the same colour as the scatter points?

I think that risks leading the reader to think that the marker is an actual scatter point.

That's a good point.

Stupid question, for 'Ordinal', why is the black error bar (?) marker not centered between the horizontal and vertical error bars?

Nevermind, I see that it is in the new rendered doc, for some reason the horizontal error bar was not rendering properly in the first one above

lucyleeow · 2025-07-11T02:40:36Z

examples/ensemble/plot_gradient_boosting_categorical.py

+            coeff_str = f"{coeff:.1f}x"
+
+        # Format exponent using Unicode superscripts
+        superscripts = str.maketrans("-0123456789", "⁻⁰¹²³⁴⁵⁶⁷⁸⁹")


Would math notation work? e.g., "$10^{%d}$"

Co-authored-by: Lucy Liu <jliu176@gmail.com>

ArturoAmorQ · 2025-07-16T10:02:28Z

I saw the #31062 (comment) on adding TargetEncoder but maybe that could be left to another PR...?

Yes. I prefer reducing the scope of this PR.

lucyleeow

LGTM thanks!

Sorry I meant the words 'faster fitting'/'lower error' may be added to the x and y axis labels.
They do also make sense here, but maybe they are too long inside the graph and 'Best models' was better?

betatim · 2025-07-17T14:42:14Z

I'll merge this for now as it has two approvals and I am looking for something to merge to update the banner on scikit-learn.org

ArturoAmorQ added 2 commits March 24, 2025 16:17

DOC Update plots in plot_gradient_boosting_categorical

78409cb

Display pipelines and wording tweak

85a9ffb

github-actions bot added the Documentation label Mar 24, 2025

sylvaincom mentioned this pull request Apr 23, 2025

feat(ComparisonReport): Suggestions of plots probabl-ai/skore#1340

Open

ArturoAmorQ added 3 commits May 2, 2025 15:07

Merge main

71a19b9

Add arrow and denser ticks as per Olivier's suggestion

07c9764

Avoid override of previous results

c9771b9

ogrisel approved these changes May 5, 2025

View reviewed changes

ArturoAmorQ requested a review from lucyleeow May 14, 2025 09:55

ArturoAmorQ mentioned this pull request Jul 9, 2025

Add dynamic placement of arrow in string encoders benchmark plot skrub-data/skrub#1495

Merged

ArturoAmorQ added 2 commits July 10, 2025 12:01

Merge main

a201d7d

Simplify code

f210c2d

lucyleeow approved these changes Jul 11, 2025

View reviewed changes

ArturoAmorQ and others added 4 commits July 16, 2025 11:52

Update examples/ensemble/plot_gradient_boosting_categorical.py

1458d42

Co-authored-by: Lucy Liu <jliu176@gmail.com>

Address Lucy's comments

10b4a04

Apply suggestions from code review

4d48f5d

Mention that error bars correspond to 1 std

23be2cc

Merge branch 'main' into change_plots

272ea7f

lucyleeow approved these changes Jul 17, 2025

View reviewed changes

Add explanation on 'best models'

96b7c7c

betatim merged commit 588f396 into scikit-learn:main Jul 17, 2025
36 checks passed

betatim mentioned this pull request Jul 17, 2025

Documentation Bug: Warning about "unstable development version" #31776

Closed

ArturoAmorQ mentioned this pull request Aug 1, 2025

DOC Improve wording in Categorical Feature support example #31864

Merged



		class CustomLogFormatter(ticker.Formatter):
		def __call__(self, x, pos=None):

Uh oh!

DOC Update plots in Categorical Feature Support in GBDT example #31062

DOC Update plots in Categorical Feature Support in GBDT example #31062

Uh oh!

Conversation

ArturoAmorQ commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArturoAmorQ commented May 2, 2025

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

lucyleeow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ commented Jul 16, 2025

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

betatim commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

ArturoAmorQ commented Mar 24, 2025 •

edited

Loading

github-actions bot commented Mar 24, 2025 •

edited

Loading

ogrisel commented Apr 30, 2025 •

edited

Loading

lucyleeow left a comment •

edited

Loading