-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Work around likely SIMD issue in tree export on 32bit OS #29327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> Co-authored-by: SGard Spreemann <gspr@nonempty.org> Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
The CI is 🚨 but I will merge main when #29328 is auto-merged and that should be all 💚 |
…nto export-tree-32bit
@mr-c @gspr, just curious, would it make the Linux distribution packaging easier (my understanding is that you are involved in it somehow) if this fix was included in the soonish to be released bug-fix release scikit-learn 1.5.1? Maybe it would not make a big difference anyway, because you are going to package an older scikit-learn release like 1.3 or 1.4? |
Debian is already carrying a version of this patch on top of version 1.4.2 in our development distribution and that will soon migrate to our testing distribution https://tracker.debian.org/pkg/scikit-learn We are carrying other patches as well, perhaps some of them could also be upstreamed https://salsa.debian.org/science-team/scikit-learn/-/tree/master/debian/patches?ref_type=heads (I'm not one of the primary maintainers of scikit-learn in Debian, but I'm on the team and recently merged some fixes) We have not yet packaged scikit-learn 1.5.x (possibly due to a bug in how we check for new versions, I will look into that now) So, no rush needed for Debian. Thank you for asking! |
To add to what @mr-c said: Thank you for asking! This kind of thought and consideration given to distributions really gives me confidence in the scikit-learn developers ❤️ |
Ah, the source distribution now has an underscore instead of a hyphen: https://pypi.org/project/scikit-learn/1.5.0/#files vs https://pypi.org/project/scikit-learn/1.4.2/#files I have fixed our regex to account for that |
Oh well, probably a side-effect of moving to Meson (and away from setuptools). This was noticed in #28757 (comment) actually, but would I have believed that someone out there was relying on this? Certainly not 😅 ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a changelog entry. I would put it for 1.5.1
sklearn/tree/_export.py
Outdated
minus_impurity = -1.0 * tree.impurity if _IS_32BIT else -tree.impurity | ||
self.colors["bounds"] = (np.min(minus_impurity), np.max(minus_impurity)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the following work ? It avoids 1 unnecessary array multiplication and it avoids to specialize the behavior for 32bit OS.
minus_impurity = -1.0 * tree.impurity if _IS_32BIT else -tree.impurity | |
self.colors["bounds"] = (np.min(minus_impurity), np.max(minus_impurity)) | |
# equivalent to (np.min(-tree.impurity), np.max(-tree.impurity)) | |
self.colors["bounds"] = (-np.max(tree.impurity), -np.min(tree.impurity)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my original thought in #27506 (comment) 😉.
I find it slightly too clever and I kind of doubt this matters that much in practice since this is not in a performance-critical part of the code (visualization/debugging)
Having said that, the min(-arr) == -max(arr)
trick is now the official Debian patch for scikit-learn 1.4 apparently: https://salsa.debian.org/science-team/scikit-learn/-/blob/master/debian/patches/i386-impurity-nan.patch?ref_type=heads 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my original thought [...] I find it too clever
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a matter of taste indeed. I find it more intuitive.
One argument: the workaround should be removed when the bug is fixed upstream, while the suggestion can be let as is for ever. I updated the suggestion to add a comment to make it less surprising
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremiedbb OK you won me over by talking to my ex-physicist side 😉, I have added a comment to kind of indicate there is a reason why we write it like this.
I have also added an entry in the 1.5.1 changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…learn#29327) Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> Co-authored-by: SGard Spreemann <gspr@nonempty.org> Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
…learn#29327) Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> Co-authored-by: SGard Spreemann <gspr@nonempty.org> Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com> Co-authored-by: SGard Spreemann <gspr@nonempty.org> Co-authored-by: Sergio Pascual <sergiopr@fis.ucm.es>
Fix #27506.
I have put @mr-c,@gspr and @sergiopasra as co-authors, thanks for your inputs on this tricky issue 🙏!
I tested the fix in the Docker image provided by @mr-c in #27506 (comment) and it works
Note for Distro package managers the patch may not work directly because
_IS_32BIT
was moved fromsklearn.utils
tosklearn.utils.fixes
. Honestly I am pretty sure that this code is not performance critical and that just using-1.0 * tree.impurity
even on 64bit OS is a completely fine patch. After all this code was not SIMDed before using numpy 1.26 and nobody ever complained that it was slow ...The main reason, I used
_IS_32BIT
is that it makes it more explicit and I hope that one day it can be cleaned up 🤞