ENH add an option to centered ICE and PD #18310

JoElfner · 2020-08-31T16:04:36Z

This adds the centered argument to plot_partial_dependence and to PartialDependenceDisplay.plot to allow for plotting cICE-plots, centered PDP plots as well es centered plots for kind='both'

Reference Issue

Fixes #18195

What does this implement/fix? Explain your changes.

By setting centered=True, cICE or centered PDP are plotted, see:

import matplotlib.pyplot as plt
from sklearn.inspection import partial_dependence
from sklearn.inspection import plot_partial_dependence
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import QuantileTransformer
from sklearn.neural_network import MLPRegressor

import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

cal_housing = fetch_california_housing()
X = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)
y = cal_housing.target

y -= y.mean()

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, random_state=0
)

print("Training MLPRegressor...")
est = make_pipeline(QuantileTransformer(),
                    MLPRegressor(hidden_layer_sizes=(50, 50),
                                 learning_rate_init=0.01,
                                 early_stopping=True))
est.fit(X_train, y_train)

print('Computing partial dependence plots...')
features = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms']
display = plot_partial_dependence(
       est, X_train, features, kind="both", subsample=50,
       n_jobs=3, grid_resolution=20
)
display.figure_.suptitle(
    'Partial dependence of house value on non-location features\n'
    'for the California housing dataset, with MLPRegressor'
)
display.figure_.subplots_adjust(hspace=0.3)

comments:

flake8 and pytest pass on my machine (somehow the centered arg in parametrize got lost, even though it was already there before the first commit...)

documentation will follow once this is reviewed

glemaitre

We need to document the parameter in the user guide as well.
We need a test in the test_plot_partial_dependence.py to check that when centered=True all lines start y=0.

sklearn/inspection/_plot/partial_dependence.py

glemaitre · 2020-09-02T12:30:41Z

Also, please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

glemaitre review Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

JoElfner · 2020-09-02T14:21:02Z

Thanks for the review. I'll update the docs and changelog in the next few days.

edit: Ok, done. The whatsnew commit now shows all lines added since I forked the repo. Is that correct?

…o master

JoElfner

As for the if ln is not None clause: This results from faulty indexing in the current plot_partial_dependency implementation. I'd suggest that I open a separate Issue/PR to deal with this, since it is not exactly connected to this PR.
See my reviews for a proposed solution.

sklearn/inspection/_plot/partial_dependence.py

sklearn/inspection/_plot/tests/test_plot_partial_dependence.py

glemaitre

We still need to have documentation in the user guide and see if it is meaningful to change an example

doc/whats_new/v0.24.rst

sklearn/inspection/_plot/partial_dependence.py

glemaitre · 2020-09-04T12:12:38Z

As for the if ln is not None clause: This results from faulty indexing in the current plot_partial_dependency implementation. I'd suggest that I open a separate Issue/PR to deal with this, since it is not exactly connected to this PR.
See my reviews for a proposed solution.

Yes we need to solve this issue first and open another PR.

JoElfner · 2020-09-04T14:04:29Z

We still need to have documentation in the user guide and see if it is meaningful to change an example

done.

As for the if ln is not None clause: This results from faulty indexing in the current plot_partial_dependency implementation. I'd suggest that I open a separate Issue/PR to deal with this, since it is not exactly connected to this PR.
See my reviews for a proposed solution.

Yes we need to solve this issue first and open another PR.

Ok, going to open an Issue once this is merged.

glemaitre · 2020-09-08T08:19:33Z

Ok, going to open an Issue once this is merged.

I would solve the bug first.

JoElfner · 2020-09-08T09:13:10Z

Ok, going to open an Issue once this is merged.

I would solve the bug first.

I disagree, since it is imho not productive/effective to postpone a feature introduction due to a bug, which has been there before and which is not directly connected to the new feature. But since you've got a much better overview of all processes in this project, I assume that you are right. :)
So I'll wait until the bug has been fixed.

glemaitre · 2020-09-08T11:48:40Z

I made a PR there: #18359

doc/modules/partial_dependence.rst

ogrisel · 2020-09-09T14:11:01Z

Using the first sample as reference to center everything seems very arbitrary. I have really no idea how to make sense of the resulting plot.

Why not shift by removing the mean value of each ICE line instead?
Do you have any reference in the literature that discusses the centering with the value of the first point?

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

JoElfner · 2020-09-09T14:34:10Z

Hi @ogrisel,
thanks for your review. Removing the mean has several downsides, f.i. outlier sensitivity and having an undefined overlapping location (if any). Thus it is hard to compare each sample's trend.
Another approach is to use a specific quantile of the data as centering anchor, as mentioned in the issue considering this PR. You can also find literature in this post. It may not be "citeable" literature, but at least the online-book provdies references.

ogrisel · 2020-09-09T14:44:05Z

Ok so this centering method is indeed introduced in Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: VisualizingStatistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computa-tional and Graphical Statistics https://arxiv.org/abs/1309.6392 which meets the scikit-learn inclusion criteria (with more than 200 citations).

So +1 for leaving this at it is.

ogrisel · 2020-09-09T14:46:15Z

The above reference is already cited in the user guide. Could you please add another ref to in the new paragraph where you introduce centered ICE?

doc/modules/partial_dependence.rst

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

## August 31th, 2021 ### Gael * TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig ### Olivier - input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`) - reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444) - next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release) - next: assist Adrin for the release process - next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432) - next: come back to intel to write a technical roadmap for a possible collaboration ### Julien - Was on holidays - Planned week @ Nexedi, Lille, from September 13th to 17th - Reviewed PRs - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE) - Others PRs prior to holidays - [`#20254`](scikit-learn/scikit-learn#20254) - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs - Next: comparing against scipy's - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help - Next: I need to block time to study Cython code. ### Mathis - `sklearn_benchmarks` - Adapting benchmark script to run on Margaret - Fix issue with profiling files too big to be deployed on Github Pages - Ensure deterministic benchmark results - Working on declarative pipeline specification - Next: run long HPO benchmarks on Margaret ### Arturo - Finished MOOC! - Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432)) - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22) - currently working on expanding the notes up to 70% - Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448) ### Jérémie - back from holidays, catching up - Mathis' benchmarks - trying to find what's going on with ASV benchmarks (asv should display the versions of all build and runtime depndencies for each run) ### Guillaume - back from holidays - Next: - release with Adrin - check the PR and issue trackers ### TODO / Next - Expand Loïc’s notes up to 70% (Arturo) - Create presentation to discuss my experience doing the MOOC (Arturo) - Help with the scikit-learn release (Olivier, Guillaume) - HR: Jeremy's renewal, Chiara's replacement (Gael) - Mathis's consulting gig (Olivier, Gael, Mathis)

jeremiedbb · 2022-03-22T22:35:06Z

@glemaitre is it still relevant ? do you think you'll find time to refresh it before 1.1 (targeted end of april) ?

sklearn/inspection/_plot/partial_dependence.py

doc/whats_new/v1.1.rst

examples/inspection/plot_partial_dependence.py

sklearn/inspection/_plot/partial_dependence.py

sklearn/inspection/_plot/tests/test_plot_partial_dependence.py

sklearn/inspection/_plot/partial_dependence.py

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

jeremiedbb

LGTM

doc/modules/partial_dependence.rst

doc/whats_new/v1.1.rst

sklearn/inspection/_plot/partial_dependence.py

thomasjpfan · 2022-04-07T14:42:56Z

sklearn/inspection/_plot/partial_dependence.py

+                old_min_pd, old_max_pd = pdp_lim.get(n_fx, (min_pd, max_pd))
+                min_pd = min(min_pd, old_min_pd)
+                max_pd = max(max_pd, old_max_pd)
+                pdp_lim[n_fx] = (min_pd, max_pd)


Should we center before computing pdp_lim?

Currently the limits are just a bit wider than if they were computed after the centering, but never too narrow. Can we merge as is and do a follow up PR to better adjust the limits ? @glemaitre @thomasjpfan ?

I would merge it now and make a small subsequent PR.

I'm okay to merge first if @jeremiedbb and @jjerphan are okay with it.

I think that fixing pdp_lim to take into account of centering is a blocker for release tho. Imagine a case where the first point is [0, -10] and the last point is [0, 10], the range is (-10, 10), which will cut off the plot after centering.

Edit: I think the bounds are okay.

Looking this over, I think the current implementation is correct. preds_offset correctly offsets the prediction based on centering.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jeremiedbb · 2022-04-07T16:52:21Z

Thanks @JoElfner and @glemaitre !

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

centered pdp and ice plots

a253639

github-actions bot added the module:inspection label Aug 31, 2020

JoElfner added 2 commits August 31, 2020 18:15

tests corrected

6794530

tests corrected again

78fdb34

glemaitre self-requested a review September 2, 2020 09:22

glemaitre reviewed Sep 2, 2020

View reviewed changes

Apply suggestions from code review

afa854e

glemaitre review Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

JoElfner added 5 commits September 2, 2020 16:49

function doc and whatsnew

915b5ac

retract whatsnew and redo it

2028068

Merge branch 'master' of https://github.com/JoElfner/scikit-learn int…

d2c9458

…o master

testing for first y-val to be zero when centering

3568521

test updated

9111514

JoElfner commented Sep 3, 2020

View reviewed changes

sklearn/inspection/_plot/partial_dependence.py Outdated Show resolved Hide resolved

sklearn/inspection/_plot/partial_dependence.py Outdated Show resolved Hide resolved

sklearn/inspection/_plot/tests/test_plot_partial_dependence.py Outdated Show resolved Hide resolved

glemaitre reviewed Sep 4, 2020

View reviewed changes

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

sklearn/inspection/_plot/partial_dependence.py Outdated Show resolved Hide resolved

JoElfner added 2 commits September 4, 2020 15:23

redo of whatsnew

db08c81

doc and examples updated

8b06311

ogrisel reviewed Sep 9, 2020

View reviewed changes

doc/modules/partial_dependence.rst Outdated Show resolved Hide resolved

Update doc/modules/partial_dependence.rst

97295c3

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel reviewed Sep 9, 2020

View reviewed changes

doc/modules/partial_dependence.rst Outdated Show resolved Hide resolved

glemaitre and others added 2 commits August 31, 2021 11:07

Apply suggestions from code review

dcf5c4b

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

merge origin/main

df2eb83

glemaitre added this to the 1.1 milestone Aug 31, 2021

iter

a538c37

glemaitre mentioned this pull request Sep 6, 2021

API Deprecates plot_partial_dependence #20959

Merged

glemaitre removed their assignment Dec 17, 2021

glemaitre self-assigned this Apr 6, 2022

glemaitre added 4 commits April 6, 2022 17:16

Merge remote-tracking branch 'origin/main' into pr/JoElfner/18310

e1783aa

iter

6e5a2bb

iter

169c482

iter

be7fa1f

thomasjpfan reviewed Apr 6, 2022

View reviewed changes

sklearn/inspection/_plot/partial_dependence.py Show resolved Hide resolved

update doc

8862056

jeremiedbb reviewed Apr 7, 2022

View reviewed changes

sklearn/inspection/_plot/partial_dependence.py Outdated Show resolved Hide resolved

glemaitre and others added 3 commits April 7, 2022 14:32

iter

2eafb44

Apply suggestions from code review

a1c9d6d

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

add version removal pdp_lim

e2c91a6

glemaitre changed the title ~~ENH Add option to centered ICE plots (cICE)~~ ENH add an option to centered ICE and PD Apr 7, 2022

jeremiedbb approved these changes Apr 7, 2022

View reviewed changes

doc/modules/partial_dependence.rst Outdated Show resolved Hide resolved

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved

iter

6a7aecf

thomasjpfan reviewed Apr 7, 2022

View reviewed changes

glemaitre and others added 2 commits April 7, 2022 17:06

Update sklearn/inspection/_plot/partial_dependence.py

48d83e8

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

iter

916e675

jeremiedbb merged commit adb47e7 into scikit-learn:main Apr 7, 2022

thomasjpfan mentioned this pull request Apr 7, 2022

ENH Reduce copying when centering PDPs #23076

Merged

Uh oh!

ENH add an option to centered ICE and PD #18310

ENH add an option to centered ICE and PD #18310

Uh oh!

Conversation

JoElfner commented Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

comments:

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Sep 2, 2020

Uh oh!

JoElfner commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoElfner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Sep 4, 2020

Uh oh!

JoElfner commented Sep 4, 2020

Uh oh!

glemaitre commented Sep 8, 2020

Uh oh!

JoElfner commented Sep 8, 2020

Uh oh!

glemaitre commented Sep 8, 2020

Uh oh!

Uh oh!

ogrisel commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoElfner commented Sep 9, 2020

Uh oh!

ogrisel commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Sep 9, 2020

Uh oh!

Uh oh!

jeremiedbb commented Mar 22, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Apr 7, 2022

JoElfner commented Aug 31, 2020 •

edited

Loading

JoElfner commented Sep 2, 2020 •

edited

Loading

ogrisel commented Sep 9, 2020 •

edited

Loading

ogrisel commented Sep 9, 2020 •

edited

Loading

thomasjpfan Apr 7, 2022 •

edited

Loading