Skip to content

[MRG] Partial dependence plots -- continued #12599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 137 commits into from
Apr 24, 2019

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Nov 15, 2018

Reference Issues/PRs

This is the continuation of #5653. Lots of tests and checks have been added, with a few fixes. More detailed changes from #5653 can be found here #5653 (comment).

Closes #5653
Closes #4405

What does this implement/fix? Explain your changes.

This PR implements support for partial dependence plots with any regressor or classifier that has predict_proba.

Any other comments?

Partial dependence plots were only supported for gradient boosting trees and the functions were in ensemble.partial_dependence. All backward compatibility is preserved, but those functions are deprecated in favor of the sklearn.partial_dependence module.

The grid param has been removed in the new implementation.
The ax param was replaced by fig. The claim that the new plots were plotted on the ax argument was wrong.

Some possible improvements

In separate PRs:

  • Add support for 3D plots instead of just a contour
  • Add support for https://arxiv.org/pdf/1309.6392.pdf. Basically, instead of averaging the responses, we actually plot all of them. That would only be relevant for 1-way PDP.
  • Add support for plotting the same PDP of different estimators on the same matplotlib ax object.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to the CI to trigger the changes.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a sense of what else has changed? It's hard to get from comparing commits since files have moved etc.

@NicolasHug
Copy link
Member Author

Can you give me a sense of what else has changed? It's hard to get from comparing commits since files have moved etc.

Apart from moving files to inspection, not much has changed since you last saw it.

@jnothman
Copy link
Member

I'm happy to merge after those three comments are addressed.

@jnothman jnothman merged commit 6b7dc9b into scikit-learn:master Apr 24, 2019
@jnothman
Copy link
Member

Thanks and congrats, @NicolasHug and @trevorstephens!!

@NicolasHug
Copy link
Member Author

Thanks a lot for the reviews @glemaitre @jnothman

@trevorstephens
Copy link
Contributor

Wow, congrats @NicolasHug on getting this done, bravo!!

@amueller
Copy link
Member

Yay!!

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
@ralphhaygood
Copy link

The grid param has been removed in the new implementation

Which breaks my package sklearn-gbmi (which computes Friedman-Popescu H statistics to identify interactions between features in gradient boosted regression trees).

Looking at the source code, I don't understand why you did this. It seems to me the function could still accommodate the specified-grid operating mode.

Anyhow, is there some simple work-around I'm not seeing? I'd appreciate your advice.

@ralphhaygood
Copy link

GitHub doesn't seem to be making it obvious, so I'll clarify that the preceding comment is directed to @NicolasHug.

@NicolasHug
Copy link
Member Author

Honestly I don't exactly remember.

I assume it's because with the additional support of the brute method, the interactions between method, X and grid were becoming too intricate to be reasonably intuitive.

Unless something went wrong, you should have been warned long ago about the deprecation and ultimate removal of these previous functions.

@NicolasHug
Copy link
Member Author

Anyhow, is there some simple work-around I'm not seeing? I'd appreciate your advice.

You can still pass X and the grid will be automatically computed from there. Though I'm not sure that solves your issue

@ralphhaygood
Copy link

Unless something went wrong, you should have been warned long ago about the deprecation and ultimate removal of these previous functions.

I wasn't. I was aware of your efforts to support a broader class of models, which I applaud. I wasn't aware of the removal of the specified-grid operating mode until users of my package informed me the package was broken. (I haven't been using it myself recently.)

Though I'm not sure that solves your issue

It doesn't.

I could work around the problem by writing my own version of the partial_dependence function, supporting exactly my use case. However, it would depend on functions that aren't parts of the public API and that therefore could be changed at any time. It would be safer to create my own complete implementation. I don't have time to deal with either option right now, but I'll decide what to do and do it early next month.

@jnothman
Copy link
Member

jnothman commented Jun 20, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partial Dependence Plots for Random Forests.
9 participants