Skip to content

ENH Use scipy Yeo-Johnson implementation in PowerTransformer for scipy >= 1.9 #31227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 5, 2025

Conversation

yaichm
Copy link
Contributor

@yaichm yaichm commented Apr 19, 2025

Reference Issues/PRs

This PR is finalizing #27818 to close #26308 and also fix the warning issue reported in #23319 (comment)

Close #27818

What does this implement/fix? Explain your changes.

Use scipy.stats.yeojohnson instead of our own implementation as @lorentzenchr suggested.

Any other comments?

Copy link

github-actions bot commented Apr 19, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ab2d918. Link to the linter CI: here

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaichm @smarie @eroussama @Dlimim @HamzaLuffy @AmineHannoun @xuefeng-xu thanks for this nice PR. Seems solid. Only a few nitpicks.

@@ -80,6 +81,39 @@ def _sparse_linalg_cg(A, b, **kwargs):
return scipy.sparse.linalg.cg(A, b, **kwargs)


#TODO : remove this when minimum version of scipy >= 1.9.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#TODO : remove this when minimum version of scipy >= 1.9.0
# TODO: Remove this when minimum version of scipy >= 1.9.0, also remove
# sklearn.preprocessing._yeo_johnson_optimize and use scipy.stats.yeojohnson instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we still need _yeo_johnson_optimize since we need to get rid of the nan, see also _box_cox_optimize

@lorentzenchr lorentzenchr added this to the 1.7 milestone Apr 21, 2025
@lorentzenchr
Copy link
Member

@smarie Could you instruct how to use linting tools and the CI reports?

@smarie
Copy link
Contributor

smarie commented Apr 28, 2025

@smarie Could you instruct how to use linting tools and the CI reports?

Yes, good catch thanks @lorentzenchr . We reviewed the pre-commit tool together from https://pre-commit.com/#4-optional-run-against-all-the-files

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nitpick.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there are other small details to fix:

@@ -0,0 +1,5 @@
- Fixed numerical stability (mostly overflows) of the Yeo-Johnson transform with
Copy link
Contributor

@smarie smarie May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel @lorentzenchr I suggest to explicitly tell about the replacement of implem, what do you think ? Like this:

Suggested change
- Fixed numerical stability (mostly overflows) of the Yeo-Johnson transform with
- Now using ``scipy.stats.yeojohnson`` instead of our own implementation of the Yeo-Johnson transform. Fixed numerical stability (mostly overflows) with

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We all agree, but I do not have the permission to accept this suggestion myself. @yaichm has to do it I guess.

@ogrisel
Copy link
Member

ogrisel commented May 5, 2025

Once done, we can merge.

Comment on lines 4 to 6
Initial PR by :user: `Xuefeng Xu <@xuefeng-xu>` completed by `Mohamed Yaich<@yaichm>`,
`Oussama Er-rabie<@eroussama>`, `Mohammed Yaslam Dlimi<@Dlimim>`,
`Hamza Zaroual<@HamzaLuffy>`, `Amine Hannoun<@AmineHannoun>` and `Sylvain Marié<@smarie>`.
Copy link
Member

@lesteve lesteve May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need something like this, please double-check the rendering in the changelog by clicking on "check the rendered doc" CI status at the bottom of the PR.

Suggested change
Initial PR by :user: `Xuefeng Xu <@xuefeng-xu>` completed by `Mohamed Yaich<@yaichm>`,
`Oussama Er-rabie<@eroussama>`, `Mohammed Yaslam Dlimi<@Dlimim>`,
`Hamza Zaroual<@HamzaLuffy>`, `Amine Hannoun<@AmineHannoun>` and `Sylvain Marié<@smarie>`.
Initial PR by :user:`Xuefeng Xu <xuefeng-xu>` completed by :user:`Mohamed Yaich <yaichm>`,
:user:`Oussama Er-rabie <eroussama>`, :user:`Mohammed Yaslam Dlimi <Dlimim>`,
:user:`Hamza Zaroual <HamzaLuffy>`, :user:`Amine Hannoun <AmineHannoun>` and :user:`Sylvain Marié <smarie>`.

The current rendering is slightly off, see this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the changelog rendering looks good now see this

@lesteve lesteve changed the title ENH: Finalize Yeo-Johnson implementation with scipy.stats in PowerTransformer (#27818) ENH Use scipy Yeo-Johnson implementation in PowerTransformer for scipy >= 1.9 (#27818) May 5, 2025
@lesteve lesteve changed the title ENH Use scipy Yeo-Johnson implementation in PowerTransformer for scipy >= 1.9 (#27818) ENH Use scipy Yeo-Johnson implementation in PowerTransformer for scipy >= 1.9 May 5, 2025
@lesteve lesteve enabled auto-merge (squash) May 5, 2025 15:56
@lesteve lesteve merged commit 6c1d33f into scikit-learn:main May 5, 2025
38 checks passed
@lesteve
Copy link
Member

lesteve commented May 5, 2025

Thanks for your PR, merging!

@smarie
Copy link
Contributor

smarie commented May 5, 2025

Congratulations, team ! And thanks to the reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use scipy.stats.yeojohnson PowerTransformer
6 participants