Skip to content

partial_dependence ignores sample weights #13192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
samronsin opened this issue Feb 19, 2019 · 2 comments · Fixed by #13193
Closed

partial_dependence ignores sample weights #13192

samronsin opened this issue Feb 19, 2019 · 2 comments · Fixed by #13193

Comments

@samronsin
Copy link
Contributor

Description

When training a GBT with sample weights, the partial dependence plot completely ignores the sample weights.

Steps/Code to Reproduce

Create a dataset with two subpopulations, one subpopulation where y = X[:,1] and the other where y = -X[:,1] so that without sample weights, the partial dependences cancel out.
Add a large sample weight to the first subpopulation compared to the other (e.g. a 100 to 1 ratio) so that the resulting model should reflect the dependence y = X[:,1].

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor, partial_dependence

N = 1000000
X = np.vstack((np.random.randint(2, size=N), np.random.rand(N))).T

mask_0 = np.where(X[:,0]==0)
mask_1 = np.where(X[:,0]==1)

y = np.zeros(N)
y[mask_0] = X[:,1][mask_0]
y[mask_1] = -X[:,1][mask_1]

sample_weight = np.zeros(N)
sample_weight[mask_0] = 100.
sample_weight[mask_1] = 1.

gbt = GradientBoostingRegressor()
gbt.fit(X, y, sample_weight=sample_weight)

grid = np.arange(0,1,0.01)
pdp = partial_dependence.partial_dependence(gbt, [1], grid=grid)

Expected Results

Partial dependence with sample weights should mostly reflect the points of the dataset where y = X[:,1].

pdp_expected

Actual Results

pdp_master

Versions

0.21.dev0

@jnothman
Copy link
Member

Please see #12599 where partial_dependence is being deprecated. Does this issue apply there?

@samronsin
Copy link
Contributor Author

samronsin commented Feb 21, 2019

Thanks, I had not caught up with this PR. Indeed, with the "recursion" method, the same underlying _partial_dependence_tree function from ensemble._gradient_boosting.pyx is called, so the issue remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants