Skip to content

ENH Reduce copying when centering PDPs #23076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 8, 2022

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 7, 2022

Reference Issues/PRs

Follow up to #18310

What does this implement/fix? Explain your changes.

Since pdp_lim already does the subtraction to compute the limits, I do not think we need to do the computation again in the private _plot methods.

Also I think kind="average" should center when centered=True, otherwise parts of the plot gets cut off. For example when running this:

Code snippet

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import QuantileTransformer
from sklearn.neural_network import MLPRegressor
from sklearn.inspection import PartialDependenceDisplay

X, y = fetch_california_housing(as_frame=True, return_X_y=True)
y -= y.mean()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)

est = make_pipeline(
QuantileTransformer(),
MLPRegressor(
hidden_layer_sizes=(30, 15),
learning_rate_init=0.01,
early_stopping=True,
random_state=0,
),
)
est.fit(X_train, y_train)

common_params = {
"n_jobs": 2,
"grid_resolution": 10,
"centered": True,
"random_state": 0,
}

display = PartialDependenceDisplay.from_estimator(
est,
X_train,
features=["MedInc", "AveOccup", "HouseAge"],
kind="average",
**common_params,
)

display.figure_.suptitle("centered=True")
display.plot(centered=False)
_ = display.figure_.suptitle("centered=False")

Notice how on main the centered=True version has part of the plot cut offed.

main

Screen Shot 2022-04-07 at 4 51 54 PM

This PR

Screen Shot 2022-04-07 at 4 52 26 PM

@thomasjpfan thomasjpfan changed the title ENH Reduce copying when centering pdp ENH Reduce copying when centering PDPs Apr 7, 2022
@jeremiedbb
Copy link
Member

cc/ @glemaitre

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of it. I think that I only have a nitpick that could avoid some code redundancy.

@glemaitre
Copy link
Member

Also I think kind="average" should center when centered=True, otherwise parts of the plot gets cut off. For example when running this:

Actually we did that in the former PR in one of the last commit :)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. LGTM once @glemaitre's comment is resolved

@jeremiedbb jeremiedbb added this to the 1.1 milestone Apr 8, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre merged commit df692c0 into scikit-learn:main Apr 8, 2022
@glemaitre
Copy link
Member

Thanks @thomasjpfan

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants