Skip to content
6 changes: 6 additions & 0 deletions doc/whats_new/upcoming_changes/sklearn.metrics/30787.fix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- :func:`metrics.median_absolute_error` now uses `_averaged_weighted_percentile`
instead of `_weighted_percentile` to calculate median when `sample_weight` is not
`None`. This is equivalent to using the "averaged_inverted_cdf" instead of
the "inverted_cdf" quantile method, which gives results equivalent to `numpy.median`
if equal weights used.
By :user:`Lucy Liu <lucyleeow>`
4 changes: 2 additions & 2 deletions sklearn/metrics/_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
_xlogy as xlogy,
)
from ..utils._param_validation import Interval, StrOptions, validate_params
from ..utils.stats import _weighted_percentile
from ..utils.stats import _averaged_weighted_percentile, _weighted_percentile
from ..utils.validation import (
_check_sample_weight,
_num_samples,
Expand Down Expand Up @@ -923,7 +923,7 @@ def median_absolute_error(
if sample_weight is None:
output_errors = _median(xp.abs(y_pred - y_true), axis=0)
else:
output_errors = _weighted_percentile(
output_errors = _averaged_weighted_percentile(
xp.abs(y_pred - y_true), sample_weight=sample_weight
)
if isinstance(multioutput, str):
Expand Down
15 changes: 10 additions & 5 deletions sklearn/metrics/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,6 @@ def precision_recall_curve_padded_thresholds(*args, **kwargs):

# No Sample weight support
METRICS_WITHOUT_SAMPLE_WEIGHT = {
"median_absolute_error",
"max_error",
"ovo_roc_auc",
"weighted_ovo_roc_auc",
Expand Down Expand Up @@ -1474,9 +1473,10 @@ def test_averaging_multilabel_all_ones(name):
check_averaging(name, y_true, y_true_binarize, y_pred, y_pred_binarize, y_score)


def check_sample_weight_invariance(name, metric, y1, y2):
def check_sample_weight_invariance(name, metric, y1, y2, sample_weight=None):
rng = np.random.RandomState(0)
sample_weight = rng.randint(1, 10, size=len(y1))
if sample_weight is None:
sample_weight = rng.randint(1, 10, size=len(y1))

# top_k_accuracy_score always lead to a perfect score for k > 1 in the
# binary case
Expand Down Expand Up @@ -1552,7 +1552,10 @@ def check_sample_weight_invariance(name, metric, y1, y2):
if not name.startswith("unnormalized"):
# check that the score is invariant under scaling of the weights by a
# common factor
for scaling in [2, 0.3]:
# Due to numerical instability of floating points in `cumulative_sum` in
# `median_absolute_error`, it is not always equivalent when scaling by a float.
scaling_values = [2] if name == "median_absolute_error" else [2, 0.3]
Copy link
Member Author

@lucyleeow lucyleeow May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem due to numerical instability in cumulative sum (NOT fixed by stable_cumsum) in _weighted_percentile. It is rare for this problem to appear.

In this test, the final value in cumulative_sum was a small amount (within tolerance in stable_cumsum) higher than 'actual' value, this resulted in the adjusted_percentile_rank being a small amount higher than the 'actual' value:

adjusted_percentile_rank was 17.400000000000002 , when it should have been 17.4, which just happens to be a value in weight_cdf. Thus when we do searchsorted

xp.searchsorted(
weight_cdf[feature_idx, ...], adjusted_percentile_rank[feature_idx]

the index should be that of the 17.4 element in weight_cdf, but instead it is the next index. stable_cumsum does not fix this particular problem as it does not matter how close adjusted_percentile_rank value is to the true value, if the true value is itself present within weight_cdf, searchsorted will take the adjacent index.

Note that I checked using numpy.quantile (with "inverted_cdf`, which now supports weights) and got the same test failure).

For completeness, I will reference the recent discussion regarding use of stable_cumsum (#29431 (comment)) - it was decided to not be required in _weighted_percentile (cc @ogrisel).
Further numpy's quantile implementation simply uses cumsum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I checked using numpy.quantile (with "inverted_cdf`, which now supports weights) and got the same test failure).

Can you check the behavior of numpy.quantile with "averaged_inverted_cdf" and _averaged_weighted_percentile on this kind of edge case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I am not sure if the comment above was written before the switch from _weighted_percentile to _averaged_weighted_percentile or not.

Copy link
Member Author

@lucyleeow lucyleeow Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have clarified that the failure persists with _averaged_weighted_percentile. But I agree that I also thought _averaged_weighted_percentile should fix the problem so I did some more digging.

_averaged_weighted_percentile errors with:

E           Max absolute difference among violations: 0.00704464
E           Max relative difference among violations: 0.0095152
E            ACTUAL: array(0.733313)
E            DESIRED: array(0.740357)

_weighted_percentile errors with:

E           Max absolute difference among violations: 0.01408929
E           Max relative difference among violations: 0.01903039
E            ACTUAL: array(0.726268)
E            DESIRED: array(0.740357)

So _averaged_weighted_percentile halves the error. And the reason is because:

  • with (+)array the cumulative sum has the instability problem (i.e. adjusted_percentile_rank is 17.400000000000002)
  • with -array the cumulative sum does not have the instability problem (i.e. adjusted_percentile_rank is 17.4) 🙃

If cumulative sum had the instability problem with both array and -array I think the problem would have been fixed.
numpy.quantile with weights only supports inverted_cdf (I double checked in dev docs https://numpy.org/devdocs/reference/generated/numpy.quantile.html and numpy PRs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙃

Copy link
Member Author

@lucyleeow lucyleeow Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just documenting a reminder to myself - potentially when we refactor _averaged_weighted_percentile to avoid sorting twice, we won't use -array, in which case scale=0.3 may pass.

(I've been wanting to do the refactoring, but it has not reached the top of my todo yet 😬 )

for scaling in scaling_values:
assert_allclose(
weighted_score,
metric(y1, y2, sample_weight=sample_weight * scaling),
Expand Down Expand Up @@ -1584,8 +1587,10 @@ def test_regression_sample_weight_invariance(name):
# regression
y_true = random_state.random_sample(size=(n_samples,))
y_pred = random_state.random_sample(size=(n_samples,))
sample_weight = np.arange(len(y_true))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested in #30787 (comment). This works great.

I've amended check_sample_weight_invariance to optionally take sample_weight as I didn't want to change the sample_weight for all tests that use check_sample_weight_invariance, WDYT @glemaitre ?

metric = ALL_METRICS[name]
check_sample_weight_invariance(name, metric, y_true, y_pred)

check_sample_weight_invariance(name, metric, y_true, y_pred, sample_weight)


@pytest.mark.parametrize(
Expand Down
Loading