Skip to content

Remove median_absolute_error from METRICS_WITHOUT_SAMPLE_WEIGHT #30787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/whats_new/upcoming_changes/sklearn.metrics/30787.fix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- :func:`metrics.median_absolute_error` now uses `_averaged_weighted_percentile`
instead of `_weighted_percentile` to calculate median when `sample_weight` is not
`None`. This is equivalent to using the "averaged_inverted_cdf" instead of
the "inverted_cdf" quantile method, which gives results equivalent to `numpy.median`
if equal weights used.
By :user:`Lucy Liu <lucyleeow>`
4 changes: 2 additions & 2 deletions sklearn/metrics/_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
_xlogy as xlogy,
)
from ..utils._param_validation import Interval, StrOptions, validate_params
from ..utils.stats import _weighted_percentile
from ..utils.stats import _averaged_weighted_percentile, _weighted_percentile
from ..utils.validation import (
_check_sample_weight,
_num_samples,
Expand Down Expand Up @@ -923,7 +923,7 @@ def median_absolute_error(
if sample_weight is None:
output_errors = _median(xp.abs(y_pred - y_true), axis=0)
else:
output_errors = _weighted_percentile(
output_errors = _averaged_weighted_percentile(
xp.abs(y_pred - y_true), sample_weight=sample_weight
)
if isinstance(multioutput, str):
Expand Down
16 changes: 11 additions & 5 deletions sklearn/metrics/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,6 @@ def precision_recall_curve_padded_thresholds(*args, **kwargs):

# No Sample weight support
METRICS_WITHOUT_SAMPLE_WEIGHT = {
"median_absolute_error",
"max_error",
"ovo_roc_auc",
"weighted_ovo_roc_auc",
Expand Down Expand Up @@ -1474,9 +1473,10 @@ def test_averaging_multilabel_all_ones(name):
check_averaging(name, y_true, y_true_binarize, y_pred, y_pred_binarize, y_score)


def check_sample_weight_invariance(name, metric, y1, y2):
def check_sample_weight_invariance(name, metric, y1, y2, sample_weight=None):
rng = np.random.RandomState(0)
sample_weight = rng.randint(1, 10, size=len(y1))
if sample_weight is None:
sample_weight = rng.randint(1, 10, size=len(y1))

# top_k_accuracy_score always lead to a perfect score for k > 1 in the
# binary case
Expand Down Expand Up @@ -1552,7 +1552,11 @@ def check_sample_weight_invariance(name, metric, y1, y2):
if not name.startswith("unnormalized"):
# check that the score is invariant under scaling of the weights by a
# common factor
for scaling in [2, 0.3]:
# Due to numerical instability of floating points in `cumulative_sum` in
# `median_absolute_error` (not fixable by `stable_cumsum`), it is not
# always equivalent when scaling by a float.
scaling_values = [2] if name == "median_absolute_error" else [2, 0.3]
Copy link
Member Author

@lucyleeow lucyleeow May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem due to numerical instability in cumulative sum (NOT fixed by stable_cumsum) in _weighted_percentile. It is rare for this problem to appear.

In this test, the final value in cumulative_sum was a small amount (within tolerance in stable_cumsum) higher than 'actual' value, this resulted in the adjusted_percentile_rank being a small amount higher than the 'actual' value:

adjusted_percentile_rank was 17.400000000000002 , when it should have been 17.4, which just happens to be a value in weight_cdf. Thus when we do searchsorted

xp.searchsorted(
weight_cdf[feature_idx, ...], adjusted_percentile_rank[feature_idx]

the index should be that of the 17.4 element in weight_cdf, but instead it is the next index. stable_cumsum does not fix this particular problem as it does not matter how close adjusted_percentile_rank value is to the true value, if the true value is itself present within weight_cdf, searchsorted will take the adjacent index.

Note that I checked using numpy.quantile (with "inverted_cdf`, which now supports weights) and got the same test failure).

For completeness, I will reference the recent discussion regarding use of stable_cumsum (#29431 (comment)) - it was decided to not be required in _weighted_percentile (cc @ogrisel).
Further numpy's quantile implementation simply uses cumsum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I checked using numpy.quantile (with "inverted_cdf`, which now supports weights) and got the same test failure).

Can you check the behavior of numpy.quantile with "averaged_inverted_cdf" and _averaged_weighted_percentile on this kind of edge case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I am not sure if the comment above was written before the switch from _weighted_percentile to _averaged_weighted_percentile or not.

Copy link
Member Author

@lucyleeow lucyleeow Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have clarified that the failure persists with _averaged_weighted_percentile. But I agree that I also thought _averaged_weighted_percentile should fix the problem so I did some more digging.

_averaged_weighted_percentile errors with:

E           Max absolute difference among violations: 0.00704464
E           Max relative difference among violations: 0.0095152
E            ACTUAL: array(0.733313)
E            DESIRED: array(0.740357)

_weighted_percentile errors with:

E           Max absolute difference among violations: 0.01408929
E           Max relative difference among violations: 0.01903039
E            ACTUAL: array(0.726268)
E            DESIRED: array(0.740357)

So _averaged_weighted_percentile halves the error. And the reason is because:

  • with (+)array the cumulative sum has the instability problem (i.e. adjusted_percentile_rank is 17.400000000000002)
  • with -array the cumulative sum does not have the instability problem (i.e. adjusted_percentile_rank is 17.4) 🙃

If cumulative sum had the instability problem with both array and -array I think the problem would have been fixed.
numpy.quantile with weights only supports inverted_cdf (I double checked in dev docs https://numpy.org/devdocs/reference/generated/numpy.quantile.html and numpy PRs).

for scaling in scaling_values:
assert_allclose(
weighted_score,
metric(y1, y2, sample_weight=sample_weight * scaling),
Expand Down Expand Up @@ -1584,8 +1588,10 @@ def test_regression_sample_weight_invariance(name):
# regression
y_true = random_state.random_sample(size=(n_samples,))
y_pred = random_state.random_sample(size=(n_samples,))
sample_weight = np.arange(len(y_true))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested in #30787 (comment). This works great.

I've amended check_sample_weight_invariance to optionally take sample_weight as I didn't want to change the sample_weight for all tests that use check_sample_weight_invariance, WDYT @glemaitre ?

metric = ALL_METRICS[name]
check_sample_weight_invariance(name, metric, y_true, y_pred)

check_sample_weight_invariance(name, metric, y_true, y_pred, sample_weight)


@pytest.mark.parametrize(
Expand Down
4 changes: 3 additions & 1 deletion sklearn/utils/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,12 @@ def _weighted_percentile(array, sample_weight, percentile_rank=50, xp=None):
for feature_idx in range(weight_cdf.shape[0])
],
)
print(f"{adjusted_percentile_rank=} {adjusted_percentile_rank[0]}")
print(f"{weight_cdf=}")
# In rare cases, `percentile_indices` equals to `sorted_idx.shape[0]`
max_idx = sorted_idx.shape[0] - 1
percentile_indices = xp.clip(percentile_indices, 0, max_idx)

print(f"XXXX {percentile_indices=}")
col_indices = xp.arange(array.shape[1], device=device)
percentile_in_sorted = sorted_idx[percentile_indices, col_indices]

Expand Down
Loading