Skip to content

doc/comment-nan-sort-behaviour-weighted-percentile #31597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AHB30
Copy link

@AHB30 AHB30 commented Jun 19, 2025

Adds a developer-facing comment to clarify that _weighted_percentile assumes array backends sort NaNs to the end, consistent with NumPy’s behaviour. According to the Array API specification, the sort order of NaNs is implementation-defined and not guaranteed. This clarification helps future maintainers preserve compatibility when integrating new array backends.

Reference Issues/PRs

Adds a developer-facing comment to clarify that _weighted_percentile assumes array backends sort NaNs to the end, consistent with NumPy’s behaviour. According to the Array API specification, the sort order of NaNs is implementation-defined and not guaranteed. This clarification helps future maintainers preserve compatibility when integrating new array backends.
Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/utils/stats.py:74:89: E501 Line too long (93 > 88)
   |
72 |     ]
73 |     # NaN values get sorted to end (largest value)
74 |     # IMPORTANT: This assumes that all supported array libraries (e.g., NumPy, PyTorch, CuPy)
   |                                                                                         ^^^^^ E501
75 |     # sort NaN values to the end — like NumPy. This behaviour is currently true but not guaranteed
76 |     # by the Array API specification, which explicitly states sort order of NaNs is
   |

sklearn/utils/stats.py:75:89: E501 Line too long (98 > 88)
   |
73 |     # NaN values get sorted to end (largest value)
74 |     # IMPORTANT: This assumes that all supported array libraries (e.g., NumPy, PyTorch, CuPy)
75 |     # sort NaN values to the end — like NumPy. This behaviour is currently true but not guaranteed
   |                                                                                         ^^^^^^^^^^ E501
76 |     # by the Array API specification, which explicitly states sort order of NaNs is
77 |     # implementation-defined: https://data-apis.org/array-api/latest/API_specification/sorting_functions.html
   |

sklearn/utils/stats.py:78:89: E501 Line too long (93 > 88)
   |
76 |     # by the Array API specification, which explicitly states sort order of NaNs is
77 |     # implementation-defined: https://data-apis.org/array-api/latest/API_specification/sorting_functions.html
78 |     # Revisit this assumption if adding support for new array libraries (e.g. JAX, Ivy, etc.)
   |                                                                                         ^^^^^ E501
79 |     if xp.any(xp.isnan(largest_value_per_column)):
80 |         sorted_nan_mask = xp.take_along_axis(xp.isnan(array), sorted_idx, axis=0)
   |

Found 3 errors.

Generated for commit: 71dc265. Link to the linter CI: here

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @AHB30 and thanks for your PR.

The comment is correct and it was discussed like that on the PRs that had implemented array-api on _weighted_percentile (#29431). However, I actually think that the previous comment # NaN values get sorted to end (largest value) is enough as a hint to developers who encounter test failures after adding support for another array library. And hence I would not add this long comment.

What do you think, @EmilyXinyi and @lucyleeow?

@lucyleeow
Copy link
Member

Yeah I think the open issue is adequate and I wouldn't add a long comment in the code. I daresay the issue would be easier to find than the comment when we decide to add new array support...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants