[MRG] ENH Add support for dataframe in PDP #14028

glemaitre · 2019-06-05T14:51:05Z

build on:

This PR adds support for:

pandas DataFrame
ColumnTransformer as a preprocessing within a Pipeline.

Additional features for future PRs:

Request of PDP for augmented features. For instance, one might compute the interaction between features (e.g. PolynomialFeatures) and would be interested in the PDP of these new features. It seems that we will need support for get_feature_names for the pipeline to ease such use case.
Support of categorical data (bar plot should be OK for those).

glemaitre · 2019-06-06T09:14:04Z

Some questions regarding the expected behaviour:

When using a pipeline, do we expect values to reflect the data scale after or before preprocessing. Values before preprocessing will require support of inverse_transform which will be challenging for some transformer (e.g., ColumnTransformer, PolynomialFeature, etc.);
In the case of data augmentation within a pipeline (e.g., PolynomialFeature), one would be interested to know the interaction of an augmented feature.

jnothman · 2019-06-06T12:01:47Z

If a pipeline is passed, we should be quantifying in terms of its input features... If the user wants otherwise they should slice it.

glemaitre · 2019-06-06T12:13:46Z

If a pipeline is passed, we should be quantifying in terms of its input features... If the user wants otherwise they should slice it.

After couple of iterations this morning is what I thought as well. This will make thing so difficult otherwise. get_feature_names will be handy in case a user need to slice it.

NicolasHug · 2019-06-07T13:27:04Z

Feel free to ping when you need reviews! (I'll look at #14035)

glemaitre · 2019-07-23T16:47:27Z

I think this is good to be reviewed @thomasjpfan @NicolasHug

I will probably add an example to illustrate to we can use the ColumnTransformer

glemaitre · 2019-10-24T12:33:31Z

We would need to merge #15354 before then.

sklearn/inspection/_partial_dependence.py

sklearn/inspection/tests/test_partial_dependence.py

sklearn/inspection/_partial_dependence.py

…rame

glemaitre · 2019-10-28T15:03:39Z

I dropped the support and put a meaningful error message depending of the type of X

…rame

ogrisel

I started to review but I have now the following question:

sklearn/inspection/_partial_dependence.py

ogrisel

LGTM.

I think the example https://scikit-learn.org/dev/auto_examples/inspection/plot_partial_dependence.html should be updated to use a data frame and to pass features names instead of feature indices to make it more readable. But this can be done in a later PR if your prefer. As you wish.

ogrisel · 2019-10-30T15:58:42Z

sklearn/inspection/_partial_dependence.py

        ``X`` is used both to generate a grid of values for the
        ``features``, and to compute the averaged predictions when
        method is 'brute'.
-    features : list or array-like of int
+    features : array-like of {int, str}
        The target features for which the partial dependency should be
        computed.


Maybe make it explicit that it should be either a single feature or a pair of 2 features.

Also "target features" does not really mean anything.

May I suggest the following: "The feature or pair of interacting features for which the partial dependency should be computed."

ogrisel · 2019-10-30T16:06:32Z

Same remark for https://scikit-learn.org/dev/auto_examples/plot_partial_dependence_visualization_api.html

glemaitre · 2019-10-30T17:15:25Z

I updated the examples. However, be aware that we still have to specify features and features_names. I only updated partial_dependence and not plot_partial_dependence. So the next upcoming PR is to make plot_partial_dependence infer the features_names for dataframe such that one has only to give a list of features names.

glemaitre · 2019-10-31T09:50:55Z

@NicolasHug Can we merge this one.

glemaitre · 2019-10-31T09:51:16Z

@thomasjpfan as well :P

NicolasHug · 2019-10-31T16:25:01Z

Thanks @glemaitre !

TST add test to ensure support of pipeline in PDP

33868d1

glemaitre force-pushed the pdp_dataframe branch from 7ae21e6 to 35325f2 Compare June 5, 2019 18:54

EHN add support for dataframe in PDP

f2035fe

glemaitre force-pushed the pdp_dataframe branch from 35325f2 to f2035fe Compare June 5, 2019 23:06

glemaitre closed this Jun 6, 2019

glemaitre reopened this Jun 6, 2019

glemaitre added 4 commits June 6, 2019 14:47

revert to brute method for pipeline

133c116

refactor common part with columntransformer

79156f3

fix

59cb6f5

TST check the support of different types for features

cb4b00b

glemaitre mentioned this pull request Jun 6, 2019

[MRG] EHN add parameter axis in safe_indexing to slice rows and columns #14035

Merged

glemaitre mentioned this pull request Jun 13, 2019

[MRG] TST add test for pipeline in partial dependence #14079

Merged

glemaitre added 8 commits July 18, 2019 14:59

Merge remote-tracking branch 'origin/master' into pdp_dataframe

db0b589

problem merge

c04dcba

PEP8

0326a88

issue merge

2f0f690

iter

33e655d

fix

0194717

PEP8

72ee546

update docstring

db25ee6

glemaitre changed the title ~~[WIP] Add support for dataframe in PDP~~ [MRG] Add support for dataframe in PDP Jul 23, 2019

glemaitre added 4 commits July 23, 2019 18:51

whats new

60b8f59

EHN add support for scalar, slice and mask in safe_indexing axis=0

c01385c

DOC

0e5c037

FIX behaviour when passing None

f5e08c4

glemaitre mentioned this pull request Oct 24, 2019

MAINT revert unecessary deprecation for permutation_importance module #15354

Merged

glemaitre added 4 commits October 24, 2019 15:56

Merge remote-tracking branch 'origin/master' into pdp_dataframe

13d39f8

reduce list of estimator to check for fitness

a5777ad

remove unused import

0aa3cd9

fix

dc56f7b

thomasjpfan reviewed Oct 24, 2019

View reviewed changes

sklearn/inspection/_partial_dependence.py Outdated Show resolved Hide resolved

sklearn/inspection/tests/test_partial_dependence.py Show resolved Hide resolved

sklearn/inspection/_partial_dependence.py Outdated Show resolved Hide resolved

glemaitre added 3 commits October 24, 2019 22:40

address thomas comments

e1de4a4

remove support for slice

53cdf4a

Merge remote-tracking branch 'glemaitre/pdp_dataframe' into pdp_dataf…

09c7e7f

…rame

glemaitre added 2 commits October 28, 2019 22:40

Merge remote-tracking branch 'origin/master' into pdp_dataframe

42284d6

Merge remote-tracking branch 'glemaitre/pdp_dataframe' into pdp_dataf…

8dee997

…rame

ogrisel reviewed Oct 29, 2019

View reviewed changes

sklearn/inspection/_partial_dependence.py Outdated Show resolved Hide resolved

sklearn/inspection/_partial_dependence.py Outdated Show resolved Hide resolved

adrinjalali removed the Waiting for Reviewer label Oct 29, 2019

glemaitre added 3 commits October 30, 2019 10:48

Merge remote-tracking branch 'origin/master' into pdp_dataframe

1162ca7

add accept_slice to _determine_key_dtype

fa9f04a

docstring

f7f7096

ogrisel approved these changes Oct 30, 2019

View reviewed changes

glemaitre added 3 commits October 30, 2019 17:56

docstring

8029cf4

docstring

a187e0c

update example

46aea93

thomasjpfan approved these changes Oct 31, 2019

View reviewed changes

thomasjpfan merged commit 7ea7861 into scikit-learn:master Oct 31, 2019

glemaitre mentioned this pull request Nov 1, 2019

ENH get column names by default in PDP when passing data… #15429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] ENH Add support for dataframe in PDP #14028

[MRG] ENH Add support for dataframe in PDP #14028

glemaitre commented Jun 5, 2019 •

edited by ogrisel

Loading

glemaitre commented Jun 6, 2019

jnothman commented Jun 6, 2019 via email

glemaitre commented Jun 6, 2019

NicolasHug commented Jun 7, 2019

glemaitre commented Jul 23, 2019

glemaitre commented Oct 24, 2019

glemaitre commented Oct 28, 2019

ogrisel left a comment

ogrisel left a comment

ogrisel Oct 30, 2019

ogrisel commented Oct 30, 2019

glemaitre commented Oct 30, 2019

glemaitre commented Oct 31, 2019

glemaitre commented Oct 31, 2019

NicolasHug commented Oct 31, 2019

[MRG] ENH Add support for dataframe in PDP #14028

[MRG] ENH Add support for dataframe in PDP #14028

Conversation

glemaitre commented Jun 5, 2019 • edited by ogrisel Loading

glemaitre commented Jun 6, 2019

jnothman commented Jun 6, 2019 via email

glemaitre commented Jun 6, 2019

NicolasHug commented Jun 7, 2019

glemaitre commented Jul 23, 2019

glemaitre commented Oct 24, 2019

glemaitre commented Oct 28, 2019

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel Oct 30, 2019

Choose a reason for hiding this comment

ogrisel commented Oct 30, 2019

glemaitre commented Oct 30, 2019

glemaitre commented Oct 31, 2019

glemaitre commented Oct 31, 2019

NicolasHug commented Oct 31, 2019

glemaitre commented Jun 5, 2019 •

edited by ogrisel

Loading