Skip to content

Not clear error message for some metrics when the shape of sample_weight is inappropriate #9870

@qinhanmin2014

Description

@qinhanmin2014

Following up @lesteve's comment in #9786

in this PR we spotted a place where check_consistent_lengths(X, y) was used where check_consistent_lengths(X, y, sample_weight) should have called it would be good to double-check that this error is not present in some other places in our codebase.

In #9786, we find that roc_auc scores will not raise an error when the shape of sample_weight is not [n_samples] for binary y_true and we actually fix the problem for both roc_auc scores and average_precision scores. (For average_precision scores, the regression test will add in #9829 after we move average_precision scores out of METRIC_UNDEFINED_BINARY)

Now seems that test_common.py can guarantee that error is raised when the shape of sample_weight inappropriate, but the problem is that for some metrics, the error message is not clear because we actually do not check the shape of sample_weight. We simply rely on some other statements to block the programme. e.g.

roc_auc_score([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
# ValueError: Found input variables with inconsistent numbers of samples: [3, 3, 2]
log_loss([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
brier_score_loss([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
# Axis must be specified when shapes of a and weights differ.
metrics.r2_score([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
# operands could not be broadcast together with shapes (2,1) (3,1) 

WDYT? Is it worth to fix? If so, what message should we provide? Thanks.
cc @jnothman @lesteve

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions