-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
DOC Improve wording in Categorical Feature support example #31864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Improve wording in Categorical Feature support example #31864
Conversation
cc @ogrisel since he had the original comment. |
I think it's worth adding it even to this example. I would expect this method to be among the best even when the cardinality of the categorical features is not that large.
How would you rework it?
That sounds challenging to do. The examples on wines dataset is already quite slow to run so expanding it and using cv=5 instead of cv=3 would make it even slower. At the same time, it's good to have a categorical feature engineering example that runs on real data with a mix of both high and low cardinality features. The underfitting analysis of this faster example is nice and complementary to the other. I think I would keep both. |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@ogrisel these are the results for both plots when adding the ![]() ![]()
To keep both examples, I would cross-reference each other, use the same plotting function and possibly add the native support to the benchmark. |
@ArturoAmorQ can I merge? |
@lorentzenchr sure, thanks! |
@ArturoAmorQ I don't think the size of the error bars of the |
Reference Issues/PRs
Follow up from #31062.
What does this implement/fix? Explain your changes.
In #31062 (comment) it was suggested to add
TargetEncoder
to the benchmark, but I realized there's already an example comparing such strategy in the scenario of high cardinality, where it is the most useful.Instead this PR links to said example and takes the opportunity to:
verbose_feature_names_out=False
in theordinal_encoder
pipeline (introduced in ENH Specify categorical features with feature names in HGBDT #24889);OrdinalEncoder
in the "Native support" pipeline;Any other comments?
Maybe we can also rework the above mentioned TargetEncoder example? Even merge both examples?