-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add an option to centered ICE and PD #18310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document the parameter in the user guide as well.
We need a test in the test_plot_partial_dependence.py
to check that when centered=True
all lines start y=0
.
Also, please add an entry to the change log at |
glemaitre review Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Thanks for the review. I'll update the docs and changelog in the next few days. edit: Ok, done. The whatsnew commit now shows all lines added since I forked the repo. Is that correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the if ln is not None
clause: This results from faulty indexing in the current plot_partial_dependency
implementation. I'd suggest that I open a separate Issue/PR to deal with this, since it is not exactly connected to this PR.
See my reviews for a proposed solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to have documentation in the user guide and see if it is meaningful to change an example
Yes we need to solve this issue first and open another PR. |
done.
Ok, going to open an Issue once this is merged. |
I would solve the bug first. |
I disagree, since it is imho not productive/effective to postpone a feature introduction due to a bug, which has been there before and which is not directly connected to the new feature. But since you've got a much better overview of all processes in this project, I assume that you are right. :) |
I made a PR there: #18359 |
Using the first sample as reference to center everything seems very arbitrary. I have really no idea how to make sense of the resulting plot.
|
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Hi @ogrisel, |
Ok so this centering method is indeed introduced in Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: VisualizingStatistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computa-tional and Graphical Statistics https://arxiv.org/abs/1309.6392 which meets the scikit-learn inclusion criteria (with more than 200 citations). So +1 for leaving this at it is. |
The above reference is already cited in the user guide. Could you please add another ref to in the new paragraph where you introduce centered ICE? |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
## August 31th, 2021 ### Gael * TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig ### Olivier - input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`) - reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444) - next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release) - next: assist Adrin for the release process - next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432) - next: come back to intel to write a technical roadmap for a possible collaboration ### Julien - Was on holidays - Planned week @ Nexedi, Lille, from September 13th to 17th - Reviewed PRs - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE) - Others PRs prior to holidays - [`#20254`](scikit-learn/scikit-learn#20254) - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs - Next: comparing against scipy's - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help - Next: I need to block time to study Cython code. ### Mathis - `sklearn_benchmarks` - Adapting benchmark script to run on Margaret - Fix issue with profiling files too big to be deployed on Github Pages - Ensure deterministic benchmark results - Working on declarative pipeline specification - Next: run long HPO benchmarks on Margaret ### Arturo - Finished MOOC! - Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432)) - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22) - currently working on expanding the notes up to 70% - Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448) ### Jérémie - back from holidays, catching up - Mathis' benchmarks - trying to find what's going on with ASV benchmarks (asv should display the versions of all build and runtime depndencies for each run) ### Guillaume - back from holidays - Next: - release with Adrin - check the PR and issue trackers ### TODO / Next - Expand Loïc’s notes up to 70% (Arturo) - Create presentation to discuss my experience doing the MOOC (Arturo) - Help with the scikit-learn release (Olivier, Guillaume) - HR: Jeremy's renewal, Chiara's replacement (Gael) - Mathis's consulting gig (Olivier, Gael, Mathis)
@glemaitre is it still relevant ? do you think you'll find time to refresh it before 1.1 (targeted end of april) ? |
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
old_min_pd, old_max_pd = pdp_lim.get(n_fx, (min_pd, max_pd)) | ||
min_pd = min(min_pd, old_min_pd) | ||
max_pd = max(max_pd, old_max_pd) | ||
pdp_lim[n_fx] = (min_pd, max_pd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we center before computing pdp_lim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the limits are just a bit wider than if they were computed after the centering, but never too narrow. Can we merge as is and do a follow up PR to better adjust the limits ? @glemaitre @thomasjpfan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would merge it now and make a small subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay to merge first if @jeremiedbb and @jjerphan are okay with it.
I think that fixing pdp_lim
to take into account of centering is a blocker for release tho. Imagine a case where the first point is [0, -10]
and the last point is [0, 10]
, the range is (-10, 10), which will cut off the plot after centering.
Edit: I think the bounds are okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking this over, I think the current implementation is correct. preds_offset
correctly offsets the prediction based on centering.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Thanks @JoElfner and @glemaitre ! |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
This adds the
centered
argument toplot_partial_dependence
and toPartialDependenceDisplay.plot
to allow for plotting cICE-plots, centered PDP plots as well es centered plots forkind='both'
Reference Issue
Fixes #18195
What does this implement/fix? Explain your changes.
By setting
centered=True
, cICE or centered PDP are plotted, see:comments:
flake8 and pytest pass on my machine (somehow the
centered
arg in parametrize got lost, even though it was already there before the first commit...)documentation will follow once this is reviewed