partial_dependence ignores sample weights #13192

samronsin · 2019-02-19T17:05:44Z

Description

When training a GBT with sample weights, the partial dependence plot completely ignores the sample weights.

Steps/Code to Reproduce

Create a dataset with two subpopulations, one subpopulation where y = X[:,1] and the other where y = -X[:,1] so that without sample weights, the partial dependences cancel out.
Add a large sample weight to the first subpopulation compared to the other (e.g. a 100 to 1 ratio) so that the resulting model should reflect the dependence y = X[:,1].

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor, partial_dependence

N = 1000000
X = np.vstack((np.random.randint(2, size=N), np.random.rand(N))).T

mask_0 = np.where(X[:,0]==0)
mask_1 = np.where(X[:,0]==1)

y = np.zeros(N)
y[mask_0] = X[:,1][mask_0]
y[mask_1] = -X[:,1][mask_1]

sample_weight = np.zeros(N)
sample_weight[mask_0] = 100.
sample_weight[mask_1] = 1.

gbt = GradientBoostingRegressor()
gbt.fit(X, y, sample_weight=sample_weight)

grid = np.arange(0,1,0.01)
pdp = partial_dependence.partial_dependence(gbt, [1], grid=grid)

Expected Results

Partial dependence with sample weights should mostly reflect the points of the dataset where y = X[:,1].

Actual Results

Versions

0.21.dev0

The text was updated successfully, but these errors were encountered:

jnothman · 2019-02-19T21:48:26Z

Please see #12599 where partial_dependence is being deprecated. Does this issue apply there?

samronsin · 2019-02-21T08:58:06Z

Thanks, I had not caught up with this PR. Indeed, with the "recursion" method, the same underlying _partial_dependence_tree function from ensemble._gradient_boosting.pyx is called, so the issue remains.

samronsin mentioned this issue Feb 19, 2019

[MRG] Take sample weights into account in partial dependence computation #13193

Merged

NicolasHug closed this as completed in #13193 Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partial_dependence ignores sample weights #13192

partial_dependence ignores sample weights #13192

samronsin commented Feb 19, 2019

jnothman commented Feb 19, 2019

samronsin commented Feb 21, 2019 •

edited

Loading

partial_dependence ignores sample weights #13192

partial_dependence ignores sample weights #13192

Comments

samronsin commented Feb 19, 2019

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

jnothman commented Feb 19, 2019

samronsin commented Feb 21, 2019 • edited Loading

samronsin commented Feb 21, 2019 •

edited

Loading