Skip to content

Add sample_weight support for QuantileTransformer when fit on dense data #31147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kaekkr
Copy link

@kaekkr kaekkr commented Apr 4, 2025

Reference Issues/PRs

Fixes #30707
See also the discussion in #30707.

What does this implement/fix? Explain your changes.

This PR adds support for the sample_weight parameter to QuantileTransformer, allowing users to apply weights to samples when computing quantiles. This makes the transformation more flexible, especially in cases where samples have varying importance or are part of imbalanced datasets.

Changes made:

  • Added sample_weight parameter to fit and _dense_fit.
  • Implemented weighted quantile logic.
  • Updated tests to check for correct behavior with and without sample_weight.

Any other comments?

  • The implementation ensures backward compatibility.
  • Tests pass and maintain previous behavior when sample_weight is not provided.
  • Would appreciate feedback on edge cases or numerical accuracy concerns.

Thanks for the review!

Copy link

github-actions bot commented Apr 4, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 06960cf. Link to the linter CI: here

@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2025

Thanks for the PR. Could you please instead use sklearn.utils.stats._averaged_weighted_percentile instead of reimplementing a new version of weighted quantiles?

For the unweighted case, we should use np.nanquantile/np.percentile with method="averaged_inverted_cdf" instead.

The two changes together should help make the weighting/repetition semantic check of check_sample_weight_equivalence_on_dense_data pass.

Please mark check_sample_weight_equivalence_on_sparse_data XFAIL in the PER_ESTIMATOR_XFAIL_CHECKS dict.

@ogrisel ogrisel changed the title Quantile transformer sample weight Add sample_weight support for QuantileTransformer when fit on dense data Apr 4, 2025
@ogrisel
Copy link
Member

ogrisel commented Apr 4, 2025

Please also don't forget to document your change in a changelog entry by adding a file under doc/whats_new/upcoming_changes. See #29907 for an example.

@kaekkr
Copy link
Author

kaekkr commented Apr 4, 2025

@ogrisel Okay, thank you for your reply! I will fix that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sample_weight support to QuantileTransformer
2 participants