Skip to content

RFC Semantic of sample_weight in regression metrics #15651

@glemaitre

Description

@glemaitre

Several metrics in scikit-learn are using np.average(..., weight=sample_weight) under the hood:

  • mean_absolute_error
  • mean_squared_error
  • explained_variance_error
  • r2_score
  • mean_tweedie_deviance

When computing the average, numpy will divide by the sum of the weights. Is it the intended behaviour. For instance:

sample_weight = [1, 2, 3, 4]

and

sample_weight = [1/10, 2/10, 3/10, 4/10]

will lead to the same error/score. Dividing by the sum of the weight will also remove any meaning about the use of a unit (if sample_weight is related to a business unit for instance).

So I was wondering if we should multiply the average by the sum of the weight or not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions