Skip to content

FIX Work around likely SIMD issue in tree export on 32bit OS #29327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 21, 2024

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jun 21, 2024

Fix #27506.

I have put @mr-c,@gspr and @sergiopasra as co-authors, thanks for your inputs on this tricky issue 🙏!

I tested the fix in the Docker image provided by @mr-c in #27506 (comment) and it works

Note for Distro package managers the patch may not work directly because _IS_32BIT was moved from sklearn.utils to sklearn.utils.fixes . Honestly I am pretty sure that this code is not performance critical and that just using -1.0 * tree.impurity even on 64bit OS is a completely fine patch. After all this code was not SIMDed before using numpy 1.26 and nobody ever complained that it was slow ...

The main reason, I used _IS_32BIT is that it makes it more explicit and I hope that one day it can be cleaned up 🤞

lesteve and others added 2 commits June 21, 2024 06:31
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
Copy link

github-actions bot commented Jun 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d1c65fd. Link to the linter CI: here

@lesteve lesteve changed the title Export tree 32bit FIX Work-around likely SIMD issue on 32bit OS Jun 21, 2024
@lesteve lesteve changed the title FIX Work-around likely SIMD issue on 32bit OS FIX Work around likely SIMD issue on 32bit OS Jun 21, 2024
@lesteve lesteve changed the title FIX Work around likely SIMD issue on 32bit OS FIX Work around likely SIMD issue in tree export on 32bit OS Jun 21, 2024
@lesteve
Copy link
Member Author

lesteve commented Jun 21, 2024

The CI is 🚨 but I will merge main when #29328 is auto-merged and that should be all 💚

@lesteve lesteve added the Quick Review For PRs that are quick to review label Jun 21, 2024
@lesteve
Copy link
Member Author

lesteve commented Jun 21, 2024

@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4?

@mr-c
Copy link
Contributor

mr-c commented Jun 21, 2024

@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4?

Debian is already carrying a version of this patch on top of version 1.4.2 in our development distribution and that will soon migrate to our testing distribution

https://tracker.debian.org/pkg/scikit-learn

We are carrying other patches as well, perhaps some of them could also be upstreamed

https://salsa.debian.org/science-team/scikit-learn/-/tree/master/debian/patches?ref_type=heads

(I'm not one of the primary maintainers of scikit-learn in Debian, but I'm on the team and recently merged some fixes)

We have not yet packaged scikit-learn 1.5.x (possibly due to a bug in how we check for new versions, I will look into that now)

So, no rush needed for Debian. Thank you for asking!

@gspr
Copy link
Contributor

gspr commented Jun 21, 2024

@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4?

To add to what @mr-c said: Thank you for asking! This kind of thought and consideration given to distributions really gives me confidence in the scikit-learn developers ❤️

@mr-c
Copy link
Contributor

mr-c commented Jun 21, 2024

possibly due to a bug in how we check for new versions, I will look into that now)

Ah, the source distribution now has an underscore instead of a hyphen: https://pypi.org/project/scikit-learn/1.5.0/#files vs https://pypi.org/project/scikit-learn/1.4.2/#files

I have fixed our regex to account for that

@lesteve
Copy link
Member Author

lesteve commented Jun 21, 2024

Ah, the source distribution now has an underscore instead of a hyphen: pypi.org/project/scikit-learn/1.5.0/#files vs pypi.org/project/scikit-learn/1.4.2/#files

Oh well, probably a side-effect of moving to Meson (and away from setuptools). This was noticed in #28757 (comment) actually, but would I have believed that someone out there was relying on this? Certainly not 😅 ...

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a changelog entry. I would put it for 1.5.1

Comment on lines 267 to 268
minus_impurity = -1.0 * tree.impurity if _IS_32BIT else -tree.impurity
self.colors["bounds"] = (np.min(minus_impurity), np.max(minus_impurity))
Copy link
Member

@jeremiedbb jeremiedbb Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the following work ? It avoids 1 unnecessary array multiplication and it avoids to specialize the behavior for 32bit OS.

Suggested change
minus_impurity = -1.0 * tree.impurity if _IS_32BIT else -tree.impurity
self.colors["bounds"] = (np.min(minus_impurity), np.max(minus_impurity))
# equivalent to (np.min(-tree.impurity), np.max(-tree.impurity))
self.colors["bounds"] = (-np.max(tree.impurity), -np.min(tree.impurity))

Copy link
Member Author

@lesteve lesteve Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thought in #27506 (comment) 😉.

I find it slightly too clever and I kind of doubt this matters that much in practice since this is not in a performance-critical part of the code (visualization/debugging)

Having said that, the min(-arr) == -max(arr) trick is now the official Debian patch for scikit-learn 1.4 apparently: https://salsa.debian.org/science-team/scikit-learn/-/blob/master/debian/patches/i386-impurity-nan.patch?ref_type=heads 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thought [...] I find it too clever

😄

Copy link
Member

@jeremiedbb jeremiedbb Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of taste indeed. I find it more intuitive.
One argument: the workaround should be removed when the bug is fixed upstream, while the suggestion can be let as is for ever. I updated the suggestion to add a comment to make it less surprising

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremiedbb OK you won me over by talking to my ex-physicist side 😉, I have added a comment to kind of indicate there is a reason why we write it like this.

I have also added an entry in the 1.5.1 changelog.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb merged commit a67ebbe into scikit-learn:main Jun 21, 2024
30 checks passed
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
…learn#29327)

Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
…learn#29327)

Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
jeremiedbb pushed a commit that referenced this pull request Jul 2, 2024
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Co-authored-by: SGard Spreemann <gspr@nonempty.org>
Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:tree Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure in i686 with version 1.3.1
5 participants