-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] ENH Add support for dataframe in PDP #14028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some questions regarding the expected behaviour:
|
If a pipeline is passed, we should be quantifying in terms of its input
features... If the user wants otherwise they should slice it.
|
After couple of iterations this morning is what I thought as well. This will make thing so difficult otherwise. |
Feel free to ping when you need reviews! (I'll look at #14035) |
I think this is good to be reviewed @thomasjpfan @NicolasHug I will probably add an example to illustrate to we can use the ColumnTransformer |
We would need to merge #15354 before then. |
I dropped the support and put a meaningful error message depending of the type of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to review but I have now the following question:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think the example https://scikit-learn.org/dev/auto_examples/inspection/plot_partial_dependence.html should be updated to use a data frame and to pass features names instead of feature indices to make it more readable. But this can be done in a later PR if your prefer. As you wish.
``X`` is used both to generate a grid of values for the | ||
``features``, and to compute the averaged predictions when | ||
method is 'brute'. | ||
features : list or array-like of int | ||
features : array-like of {int, str} | ||
The target features for which the partial dependency should be | ||
computed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it explicit that it should be either a single feature or a pair of 2 features.
Also "target features" does not really mean anything.
May I suggest the following: "The feature or pair of interacting features for which the partial dependency should be computed."
I updated the examples. However, be aware that we still have to specify |
@NicolasHug Can we merge this one. |
@thomasjpfan as well :P |
Thanks @glemaitre ! |
build on:
This PR adds support for:
ColumnTransformer
as a preprocessing within aPipeline
.Additional features for future PRs:
PolynomialFeatures
) and would be interested in the PDP of these new features. It seems that we will need support forget_feature_names
for the pipeline to ease such use case.