Skip to content

REF: Integrate symmetrization in _weighted_percentile to avoid double sorting #30894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

REF: Integrate symmetrization in _weighted_percentile to avoid double sorting #30894

wants to merge 1 commit into from

Conversation

arjun-sundar3363
Copy link
Contributor

REF: Integrate symmetrization in _weighted_percentile to avoid double sorting

Description

This pull request refactors the computation of weighted percentiles by integrating symmetrization directly into the _weighted_percentile function. With this change, we avoid sorting the input array twice when computing the averaged weighted percentile. The following changes have been made:

  • Added a symmetrize parameter to _weighted_percentile that, when enabled, computes the averaged weighted percentile using both positive and negative arrays.
  • Updated _averaged_weighted_percentile to leverage the new symmetrization functionality.
  • Preserved the original functionality and all existing comments.
  • Ensured that the code complies with the scikit-learn contributing guidelines and passes all relevant tests.

This refactor improves efficiency without altering the external API or behavior.

Please review and let me know if any adjustments are required.

Copy link

github-actions bot commented Feb 25, 2025

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


black

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/utils/stats.py	2025-03-05 08:47:18.288764+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/utils/stats.py	2025-03-05 08:47:44.970606+00:00
@@ -113,11 +113,11 @@
     adjusted_percentile_rank_rev = (100 - percentile_rank) / 100 * weight_cdf_rev[-1]
 
     mask_rev = adjusted_percentile_rank_rev == 0
     adjusted_percentile_rank_rev[mask_rev] = np.nextafter(
         adjusted_percentile_rank_rev[mask_rev],
-        adjusted_percentile_rank_rev[mask_rev] + 1
+        adjusted_percentile_rank_rev[mask_rev] + 1,
     )
 
     percentile_idx_rev = np.array(
         [
             np.searchsorted(weight_cdf_rev[:, i], adjusted_percentile_rank_rev[i])
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/utils/stats.py

Oh no! 💥 💔 💥
1 file would be reformatted, 920 files would be left unchanged.

Generated for commit: eb2eb0a. Link to the linter CI: here

@ogrisel ogrisel self-requested a review February 25, 2025 08:42
@ogrisel
Copy link
Member

ogrisel commented Feb 25, 2025

Thanks for the PR. Before reviewing it, I would like to wait for the finalization of #29034 that is nearly ready to be merged and rebase this on top of main once it happens to solve potential conflicts.

@arjun-sundar3363
Copy link
Contributor Author

Thanks for the update. I’ll wait for #29034 to be finalized and will rebase my branch onto main as soon as it’s merged to resolve any conflicts. Let me know if there’s anything else you need from me in the meantime.

… sorting

- Add a  parameter to _weighted_percentile to compute the averaged weighted percentile.
- Update _averaged_weighted_percentile to use the new symmetrization functionality.
- This refactor avoids sorting the input array twice while preserving all existing functionality.
@arjun-sundar3363 arjun-sundar3363 deleted the refactor/weighted-percentile-symmetrization branch March 5, 2025 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants