-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Open
Labels
Description
Several metrics in scikit-learn are using np.average(..., weight=sample_weight)
under the hood:
mean_absolute_error
mean_squared_error
explained_variance_error
r2_score
mean_tweedie_deviance
When computing the average, numpy
will divide by the sum of the weights. Is it the intended behaviour. For instance:
sample_weight = [1, 2, 3, 4]
and
sample_weight = [1/10, 2/10, 3/10, 4/10]
will lead to the same error/score. Dividing by the sum of the weight will also remove any meaning about the use of a unit (if sample_weight
is related to a business unit for instance).
So I was wondering if we should multiply the average by the sum of the weight or not.
rth