DOC Improve wording in Categorical Feature support example #31864

ArturoAmorQ · 2025-08-01T10:05:34Z

Reference Issues/PRs

Follow up from #31062.

What does this implement/fix? Explain your changes.

In #31062 (comment) it was suggested to add TargetEncoder to the benchmark, but I realized there's already an example comparing such strategy in the scenario of high cardinality, where it is the most useful.

Instead this PR links to said example and takes the opportunity to:

remove the no longer needed verbose_feature_names_out=False in the ordinal_encoder pipeline (introduced in ENH Specify categorical features with feature names in HGBDT #24889);
make a general pass on the wording to:
- remove the corresponding mention to OrdinalEncoder in the "Native support" pipeline;
- prefer verbs in present mode;
- remove redundancies in favor of more informative text;
- improve conclusions.

Any other comments?

Maybe we can also rework the above mentioned TargetEncoder example? Even merge both examples?

github-actions · 2025-08-01T10:06:38Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 69f33db. Link to the linter CI: here}

adrinjalali · 2025-08-04T10:08:00Z

cc @ogrisel since he had the original comment.

ogrisel · 2025-08-04T15:41:54Z

In #31062 (comment) it was suggested to add TargetEncoder to the benchmark, but I realized there's already an example comparing such strategy in the scenario of high cardinality, where it is the most useful.

I think it's worth adding it even to this example. I would expect this method to be among the best even when the cardinality of the categorical features is not that large.

Maybe we can also rework the above mentioned TargetEncoder example?

How would you rework it?

Even merge both examples?

That sounds challenging to do. The examples on wines dataset is already quite slow to run so expanding it and using cv=5 instead of cv=3 would make it even slower. At the same time, it's good to have a categorical feature engineering example that runs on real data with a mix of both high and low cardinality features.

The underfitting analysis of this faster example is nice and complementary to the other. I think I would keep both.

examples/ensemble/plot_gradient_boosting_categorical.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

ArturoAmorQ · 2025-08-06T14:28:30Z

I think it's worth adding it even to this example. I would expect this method to be among the best even when the cardinality of the categorical features is not that large.

@ogrisel these are the results for both plots when adding the TargetEncoder. They seem to be too noisy with the default TargetEncoder(cv=5):

How would you rework it?

To keep both examples, I would cross-reference each other, use the same plotting function and possibly add the native support to the benchmark.

lorentzenchr · 2025-08-06T16:23:11Z

@ArturoAmorQ can I merge?

ArturoAmorQ · 2025-08-06T16:31:08Z

@lorentzenchr sure, thanks!

ogrisel · 2025-08-07T13:38:50Z

@ogrisel these are the results for both plots when adding the TargetEncoder. They seem to be too noisy with the default TargetEncoder(cv=5):

@ArturoAmorQ I don't think the size of the error bars of the TargetEncoder pipeline are significantly larger than the other alternatives. I still think it would be interesting to feature TargetEncoder encoder in this example, as it's usual a competitive alternative to the others (in terms of Pareto optimality) and can furthermore naturally handle both low and high cardinality categories. I think the examples are a useful way to make good alternative discoverable and explain the pros and cons.

DOC Improve wording in Categorical Feature support example

389ffa6

github-actions bot added the Documentation label Aug 1, 2025

ogrisel self-assigned this Aug 4, 2025

lorentzenchr reviewed Aug 6, 2025

View reviewed changes

ArturoAmorQ and others added 3 commits August 6, 2025 15:34

Apply suggestions from code review

6b231dd

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

Format

e1620a1

Wording as per Christian's comment

69f33db

lorentzenchr approved these changes Aug 6, 2025

View reviewed changes

lorentzenchr merged commit b824c72 into scikit-learn:main Aug 6, 2025
36 checks passed

ArturoAmorQ deleted the wording_categorical branch August 7, 2025 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC Improve wording in Categorical Feature support example #31864

DOC Improve wording in Categorical Feature support example #31864

Uh oh!

ArturoAmorQ commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025 •

edited

Loading

Uh oh!

adrinjalali commented Aug 4, 2025

Uh oh!

ogrisel commented Aug 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArturoAmorQ commented Aug 6, 2025 •

edited

Loading

Uh oh!

lorentzenchr commented Aug 6, 2025

Uh oh!

ArturoAmorQ commented Aug 6, 2025

Uh oh!

Uh oh!

ogrisel commented Aug 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

DOC Improve wording in Categorical Feature support example #31864

DOC Improve wording in Categorical Feature support example #31864

Uh oh!

Conversation

ArturoAmorQ commented Aug 1, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali commented Aug 4, 2025

Uh oh!

ogrisel commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArturoAmorQ commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Aug 6, 2025

Uh oh!

ArturoAmorQ commented Aug 6, 2025

Uh oh!

Uh oh!

ogrisel commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 1, 2025 •

edited

Loading

ogrisel commented Aug 4, 2025 •

edited

Loading

ArturoAmorQ commented Aug 6, 2025 •

edited

Loading

ogrisel commented Aug 7, 2025 •

edited

Loading