doc/comment-nan-sort-behaviour-weighted-percentile #31597

AHB30 · 2025-06-19T20:12:13Z

Adds a developer-facing comment to clarify that _weighted_percentile assumes array backends sort NaNs to the end, consistent with NumPy’s behaviour. According to the Array API specification, the sort order of NaNs is implementation-defined and not guaranteed. This clarification helps future maintainers preserve compatibility when integrating new array backends.

Reference Issues/PRs

Adds a developer-facing comment to clarify that _weighted_percentile assumes array backends sort NaNs to the end, consistent with NumPy’s behaviour. According to the Array API specification, the sort order of NaNs is implementation-defined and not guaranteed. This clarification helps future maintainers preserve compatibility when integrating new array backends.

github-actions · 2025-06-19T20:13:08Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff check`

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/utils/stats.py:74:89: E501 Line too long (93 > 88)
   |
72 |     ]
73 |     # NaN values get sorted to end (largest value)
74 |     # IMPORTANT: This assumes that all supported array libraries (e.g., NumPy, PyTorch, CuPy)
   |                                                                                         ^^^^^ E501
75 |     # sort NaN values to the end — like NumPy. This behaviour is currently true but not guaranteed
76 |     # by the Array API specification, which explicitly states sort order of NaNs is
   |

sklearn/utils/stats.py:75:89: E501 Line too long (98 > 88)
   |
73 |     # NaN values get sorted to end (largest value)
74 |     # IMPORTANT: This assumes that all supported array libraries (e.g., NumPy, PyTorch, CuPy)
75 |     # sort NaN values to the end — like NumPy. This behaviour is currently true but not guaranteed
   |                                                                                         ^^^^^^^^^^ E501
76 |     # by the Array API specification, which explicitly states sort order of NaNs is
77 |     # implementation-defined: https://data-apis.org/array-api/latest/API_specification/sorting_functions.html
   |

sklearn/utils/stats.py:78:89: E501 Line too long (93 > 88)
   |
76 |     # by the Array API specification, which explicitly states sort order of NaNs is
77 |     # implementation-defined: https://data-apis.org/array-api/latest/API_specification/sorting_functions.html
78 |     # Revisit this assumption if adding support for new array libraries (e.g. JAX, Ivy, etc.)
   |                                                                                         ^^^^^ E501
79 |     if xp.any(xp.isnan(largest_value_per_column)):
80 |         sorted_nan_mask = xp.take_along_axis(xp.isnan(array), sorted_idx, axis=0)
   |

Found 3 errors.

_{Generated for commit: 71dc265. Link to the linter CI: here}

StefanieSenger

Hello @AHB30 and thanks for your PR.

The comment is correct and it was discussed like that on the PRs that had implemented array-api on _weighted_percentile (#29431). However, I actually think that the previous comment # NaN values get sorted to end (largest value) is enough as a hint to developers who encounter test failures after adding support for another array library. And hence I would not add this long comment.

What do you think, @EmilyXinyi and @lucyleeow?

lucyleeow · 2025-06-21T09:07:35Z

Yeah I think the open issue is adequate and I wouldn't add a long comment in the code. I daresay the issue would be easier to find than the comment when we decide to add new array support...

github-actions bot added the module:utils label Jun 19, 2025

StefanieSenger reviewed Jun 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

doc/comment-nan-sort-behaviour-weighted-percentile #31597

doc/comment-nan-sort-behaviour-weighted-percentile #31597

AHB30 commented Jun 19, 2025

Uh oh!

github-actions bot commented Jun 19, 2025

Uh oh!

StefanieSenger left a comment •

edited

Loading

Uh oh!

lucyleeow commented Jun 21, 2025

Uh oh!

Uh oh!

Uh oh!

doc/comment-nan-sort-behaviour-weighted-percentile #31597

Are you sure you want to change the base?

doc/comment-nan-sort-behaviour-weighted-percentile #31597

Conversation

AHB30 commented Jun 19, 2025

Reference Issues/PRs

Uh oh!

github-actions bot commented Jun 19, 2025

❌ Linting issues

ruff check

Uh oh!

StefanieSenger left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jun 21, 2025

Uh oh!

Uh oh!

`ruff check`

StefanieSenger left a comment •

edited

Loading