-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Partial dependence plots -- continued #12599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
137 commits
Select commit
Hold shift + click to select a range
57f9a6f
general partial dependence plots
trevorstephens 9a09888
add init
trevorstephens e714e16
implement exact and estimated methods
trevorstephens 19ed28e
support for Pipeline and GridSearchCV type estimators
trevorstephens 6fafc5e
add multioutput support
trevorstephens 152a190
rebase and catch up to #6762, #7673, #7846
trevorstephens 2cdc5ea
catch up on #9434
trevorstephens ba1f8da
initial update of plot_partial_dependence
trevorstephens 1b1d8f0
deprecate ensemble.partial_dependence
trevorstephens 9095305
refactor estimated and exact functions to _predict
trevorstephens 3fc1727
make "auto" the default rather than None for method
trevorstephens 259ec99
some more refactoring
trevorstephens cbc20af
avoid namespace collision
trevorstephens 63da115
fix output shapes of all estimators
trevorstephens 8f7d2b0
add tests to ensure all estimators output same shape
trevorstephens 6fc3a49
quick fixes
trevorstephens b1f8bfc
fix docstring, test fails
trevorstephens dc93b69
refactor tests for easier debugging
trevorstephens cd8f8de
speed up tests, add two-way plot test
trevorstephens 4eb1a80
move input validation on X
trevorstephens 21544ce
fix output shape for multi-label classification
trevorstephens 610b5c5
update plot helper to support multi-output
trevorstephens dcbd0c6
update plot helper to pass-through output
trevorstephens 08ef804
Merge branch 'master' into partial_dep
NicolasHug 3f5c7f7
removed estimated method, small refactoring
NicolasHug 45e648d
factorized some test
NicolasHug ba4868f
some more refactoring
NicolasHug 414387b
test for _grid_from_X
NicolasHug f635693
few changes and comments
NicolasHug 58bfbad
some test + removed multioutput logic for now
NicolasHug cabb7f1
some more tests
NicolasHug 7e28cf5
removed support for multioutput multiclass and added back multioutput…
NicolasHug f84787d
better tests
NicolasHug 545ca6f
Removed support for RandomForestRegressor with recursion (does not
NicolasHug e086051
merged label and output into target
NicolasHug 137cd07
renamed exact into brute
NicolasHug b00a23d
renamin
NicolasHug 787e07f
some refactoring and tests
NicolasHug 39dffd7
some docs and tests
NicolasHug f9f7ee7
Added check for grid_resolution
NicolasHug 15e824d
docs
NicolasHug 2f34cc1
added deprecation in doc and used decorator
NicolasHug a3f6ed1
Merge branch 'master' into partial_dep
NicolasHug 36db441
added whatsnew entry
NicolasHug ef40ede
pep8
NicolasHug a766cad
added PR number to whatsnew
NicolasHug 828cca4
sorted dict keys for python2
NicolasHug e86bdab
trying to fix python37 issue
NicolasHug e26c0ac
removed use of dict for function dispatching
NicolasHug 763d151
filtered out warnings in tests
NicolasHug f5ff519
added test for multioutput
NicolasHug 32cafe8
fixed comment
NicolasHug 784277d
Fixed doctest
NicolasHug 2a58752
updated docstrings
NicolasHug a5285e1
put lazy imports in deprecated module
NicolasHug 9bcc0ca
Finished removing old support for RandomForest
NicolasHug 1c0b11d
fixed whatsnew
NicolasHug 8f016c6
removed unrelated change
NicolasHug 4bf6c90
small test refactoring
NicolasHug e628ae6
Merge branch 'master' into partial_dep
NicolasHug fa6eba7
pyt back ifmatplotlib dec
NicolasHug 634dc33
pep8
NicolasHug 2783b04
Merge branch 'master' into partial_dep
NicolasHug 512b353
addressed some comments
NicolasHug 736ba01
Merge branch 'master' into partial_dep
NicolasHug ef09e80
Added sanity check
NicolasHug 60b69b8
Added warnings about non constant init estimators
NicolasHug 34edf8f
Removed useless train_test_split from example
NicolasHug abd242b
Merge branch 'master' into partial_dep
NicolasHug 4975dc9
put back old versions in ensemble/partial_dependence.py to remove gri…
NicolasHug 995f4e9
Removed grid param from partial_dependence()
NicolasHug f817460
Merge branch 'master' into partial_dep
NicolasHug 46d3da4
Merge branch 'master' into partial_dep
NicolasHug 7a8fb44
Added MLPRegressor to example
NicolasHug 2f67a35
Remoed ax param and used fig instead
NicolasHug 2e1f926
Addressed comments
NicolasHug 56ac79e
minor docstring change
NicolasHug 4149a9c
Addressed comments
NicolasHug 69b95e9
Addressed comments from Joel
NicolasHug c7c4614
Merge branch 'master' into partial_dep
NicolasHug 4d7c062
Merge branch 'master' into partial_dep
NicolasHug 18c8f32
Merge branch 'partial_dep' of github.com:NicolasHug/scikit-learn into…
NicolasHug 01ab87c
rm blank line
NicolasHug ca8c0fd
moved into inspect module
NicolasHug d68ebc9
added sklearn/inspect/tests/__init__.py
NicolasHug 95de133
Merge branch 'master' into partial_dep
NicolasHug 5d09584
Hopefully fixes windows issue?
NicolasHug 592a589
Using add_subpackage
NicolasHug 5f4c317
Merge branch 'master' into partial_dep
NicolasHug 38c1c54
wording
NicolasHug 88ce5fa
Merge branch 'master' into partial_dep
NicolasHug 54ece6b
Added response parameter
NicolasHug aaeab44
indent
NicolasHug 28b936a
pep8
NicolasHug c40d472
Merge branch 'master' into partial_dep
NicolasHug 2c65b03
changed proba into predict_proba, decision into decision_function and…
NicolasHug 2bc0f0f
Updated references
NicolasHug 600cf7e
link to glossary terms
NicolasHug 91c7822
Merge remote-tracking branch 'upstream/master' into partial_dep
NicolasHug abb15b6
Addressed Joels comments
NicolasHug de2ebe5
Addressed Joels comments
NicolasHug fe8a026
Use pytest for exceptions and warnings
NicolasHug 07e8de4
Renamed inspect into model_selection
NicolasHug d33a732
created plot module and put plot_partial_dependence() there
NicolasHug 27a7eab
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pa…
NicolasHug 8262419
Apply suggestions from code review
glemaitre 3464b7e
Addressed comments from Guillaume
NicolasHug 591471f
put everything in sklearn.inspection
NicolasHug 1c073ec
Merge branch 'partial_dep' of github.com:NicolasHug/scikit-learn into…
NicolasHug b89d5c4
removed model_inspection
NicolasHug f27f542
pep8
NicolasHug 9bca47c
plot_partial_dep doesnt return anything
NicolasHug ad326af
Ignored dep warning for new tests in ensemble
NicolasHug 3823fd2
ported sample_weight tests
NicolasHug ba06512
Merge remote-tracking branch 'upstream/master' into partial_dep
NicolasHug 2be41ce
docstring
NicolasHug 0b400c0
Merge branch 'master' of github.com:scikit-learn/scikit-learn into pa…
NicolasHug 743c838
Addressed comments
NicolasHug cb5166a
Apply suggestions from code review
glemaitre b2e3be7
Merge branch 'partial_dep' of github.com:NicolasHug/scikit-learn into…
NicolasHug f9cb127
forgot some merging conflicts
NicolasHug a8d7991
put back old test
NicolasHug 02e74ec
Update sklearn/utils/__init__.py
glemaitre bed53c1
Merge branch 'partial_dep' of github.com:NicolasHug/scikit-learn into…
NicolasHug 2b52051
Apply suggestions from code review
glemaitre 0a13f1b
comments
NicolasHug 14dbd2b
comments
NicolasHug 4a1b11d
Addressed comments
NicolasHug 92795a9
pep8
NicolasHug 30bffb3
Avoid re-computating quantiles
NicolasHug e0094b6
fixed example
NicolasHug d5b1559
removed warnings refs
NicolasHug 3fce2c4
MAINT: install matplotlib in conda latest build
glemaitre d22d7b2
Addressed comments
NicolasHug 80f53cb
Merge branch 'partial_dep' of github.com:NicolasHug/scikit-learn into…
NicolasHug 5677050
Merge remote-tracking branch 'upstream/master' into partial_dep
NicolasHug 27c261e
forgot if_mpl decorator
NicolasHug File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
.. include:: includes/big_toc_css.rst | ||
|
||
.. _inspection: | ||
|
||
Inspection | ||
---------- | ||
|
||
.. toctree:: | ||
|
||
modules/partial_dependence |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
|
||
.. _partial_dependence: | ||
|
||
======================== | ||
Partial dependence plots | ||
======================== | ||
|
||
.. currentmodule:: sklearn.inspection | ||
|
||
Partial dependence plots (PDP) show the dependence between the target | ||
response [1]_ and a set of 'target' features, marginalizing over the values | ||
of all other features (the 'complement' features). Intuitively, we can | ||
interpret the partial dependence as the expected target response as a | ||
function of the 'target' features. | ||
|
||
Due to the limits of human perception the size of the target feature set | ||
must be small (usually, one or two) thus the target features are usually | ||
chosen among the most important features. | ||
|
||
The figure below shows four one-way and one two-way partial dependence plots | ||
for the California housing dataset, with a :class:`GradientBoostingRegressor | ||
<sklearn.ensemble.GradientBoostingRegressor>`: | ||
|
||
.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to check the documentation to see if this is rendered properly (for the moment the README.txt is missing). |
||
:target: ../auto_examples/inspection/plot_partial_dependence.html | ||
:align: center | ||
:scale: 70 | ||
|
||
One-way PDPs tell us about the interaction between the target response and | ||
the target feature (e.g. linear, non-linear). The upper left plot in the | ||
above figure shows the effect of the median income in a district on the | ||
median house price; we can clearly see a linear relationship among them. Note | ||
that PDPs assume that the target features are independent from the complement | ||
features, and this assumption is often violated in practice. | ||
|
||
PDPs with two target features show the interactions among the two features. | ||
For example, the two-variable PDP in the above figure shows the dependence | ||
of median house price on joint values of house age and average occupants per | ||
household. We can clearly see an interaction between the two features: for | ||
an average occupancy greater than two, the house price is nearly independent of | ||
the house age, whereas for values less than 2 there is a strong dependence | ||
on age. | ||
|
||
The :mod:`sklearn.inspection` module provides a convenience function | ||
:func:`plot_partial_dependence` to create one-way and two-way partial | ||
dependence plots. In the below example we show how to create a grid of | ||
partial dependence plots: two one-way PDPs for the features ``0`` and ``1`` | ||
and a two-way PDP between the two features:: | ||
|
||
>>> from sklearn.datasets import make_hastie_10_2 | ||
>>> from sklearn.ensemble import GradientBoostingClassifier | ||
>>> from sklearn.inspection import plot_partial_dependence | ||
|
||
>>> X, y = make_hastie_10_2(random_state=0) | ||
>>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, | ||
... max_depth=1, random_state=0).fit(X, y) | ||
>>> features = [0, 1, (0, 1)] | ||
>>> plot_partial_dependence(clf, X, features) #doctest: +SKIP | ||
|
||
You can access the newly created figure and Axes objects using ``plt.gcf()`` | ||
and ``plt.gca()``. | ||
|
||
For multi-class classification, you need to set the class label for which | ||
the PDPs should be created via the ``target`` argument:: | ||
|
||
>>> from sklearn.datasets import load_iris | ||
>>> iris = load_iris() | ||
>>> mc_clf = GradientBoostingClassifier(n_estimators=10, | ||
... max_depth=1).fit(iris.data, iris.target) | ||
>>> features = [3, 2, (3, 2)] | ||
>>> plot_partial_dependence(mc_clf, X, features, target=0) #doctest: +SKIP | ||
|
||
The same parameter ``target`` is used to specify the target in multi-output | ||
regression settings. | ||
|
||
If you need the raw values of the partial dependence function rather than | ||
the plots, you can use the | ||
:func:`sklearn.inspection.partial_dependence` function:: | ||
|
||
>>> from sklearn.inspection import partial_dependence | ||
|
||
>>> pdp, axes = partial_dependence(clf, X, [0]) | ||
>>> pdp # doctest: +ELLIPSIS | ||
array([[ 2.466..., 2.466..., ... | ||
>>> axes # doctest: +ELLIPSIS | ||
[array([-1.624..., -1.592..., ... | ||
|
||
The values at which the partial dependence should be evaluated are directly | ||
generated from ``X``. For 2-way partial dependence, a 2D-grid of values is | ||
generated. The ``values`` field returned by | ||
:func:`sklearn.inspection.partial_dependence` gives the actual values | ||
used in the grid for each target feature. They also correspond to the axis | ||
of the plots. | ||
|
||
For each value of the 'target' features in the ``grid`` the partial | ||
dependence function needs to marginalize the predictions of the estimator | ||
over all possible values of the 'complement' features. With the ``'brute'`` | ||
method, this is done by replacing every target feature value of ``X`` by those | ||
in the grid, and computing the average prediction. | ||
|
||
In decision trees this can be evaluated efficiently without reference to the | ||
training data (``'recursion'`` method). For each grid point a weighted tree | ||
traversal is performed: if a split node involves a 'target' feature, the | ||
corresponding left or right branch is followed, otherwise both branches are | ||
followed, each branch is weighted by the fraction of training samples that | ||
entered that branch. Finally, the partial dependence is given by a weighted | ||
average of all visited leaves. Note that with the ``'recursion'`` method, | ||
``X`` is only used to generate the grid, not to compute the averaged | ||
predictions. The averaged predictions will always be computed on the data with | ||
which the trees were trained. | ||
|
||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.. rubric:: Footnotes | ||
|
||
.. [1] For classification, the target response may be the probability of a | ||
class (the positive class for binary classification), or the decision | ||
function. | ||
|
||
.. topic:: Examples: | ||
|
||
* :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rm blank |
||
.. topic:: References | ||
|
||
.. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, `The Elements of | ||
Statistical Learning <https://web.stanford.edu/~hastie/ElemStatLearn//>`_, | ||
Second Edition, Section 10.13.2, Springer, 2009. | ||
|
||
.. [Mol2019] C. Molnar, `Interpretable Machine Learning | ||
<https://christophm.github.io/interpretable-ml-book/>`_, Section 5.1, 2019. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move ensemble.* to undre Deprecated below