-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Following up @lesteve's comment in #9786
in this PR we spotted a place where check_consistent_lengths(X, y) was used where check_consistent_lengths(X, y, sample_weight) should have called it would be good to double-check that this error is not present in some other places in our codebase.
In #9786, we find that roc_auc scores will not raise an error when the shape of sample_weight is not [n_samples] for binary y_true and we actually fix the problem for both roc_auc scores and average_precision scores. (For average_precision scores, the regression test will add in #9829 after we move average_precision scores out of METRIC_UNDEFINED_BINARY)
Now seems that test_common.py can guarantee that error is raised when the shape of sample_weight inappropriate, but the problem is that for some metrics, the error message is not clear because we actually do not check the shape of sample_weight. We simply rely on some other statements to block the programme. e.g.
roc_auc_score([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
# ValueError: Found input variables with inconsistent numbers of samples: [3, 3, 2]
log_loss([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
brier_score_loss([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
# Axis must be specified when shapes of a and weights differ.
metrics.r2_score([0, 1, 0], [1, 0, 1], sample_weight=[1, 1])
# operands could not be broadcast together with shapes (2,1) (3,1)
WDYT? Is it worth to fix? If so, what message should we provide? Thanks.
cc @jnothman @lesteve