-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
scikit-learn 1.6 changed behavior of growing trees #30554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for the report! I agree this is a bug & opened #30557 to fix it. |
@sebp thanks a lot for the reproducer and the diagnostic! As a maintainer, high quality reports like yours are so useful 🙏! Something I would suggest if you think this makes sense for your project: test scikit-learn development wheels so you can give feed-back closer to the time the PR is merged rather than after we release. If you think it it not worth it for your project, it is completely fine of course! For example, if this is the first time in 5 years that a new scikit-learn release breaks scikit-survival, then this is probably not worth it ... If you think this is worth it, I would be keen to help a bit on the CI set-up side, so let me know! This would be an opportunity for me to realise how much work this is and get a better feeling of the trade-off involved. The basic idea is to install scikit-learn development wheel (built daily from
see https://scikit-learn.org/dev/developers/advanced_installation.html#installing-nightly-builds. There is a number of other projects, in case you feel more adventurous and want to test other projects dependencies (numpy, scipy, matplotlib, etc ...) see https://anaconda.org/scientific-python-nightly-wheels/repo for the full list. Why you would want to do this? (optional reading 😅)We have been doing something similar in scikit-learn for numpy and scipy for some time, and I find this extremely useful for all the projects involved. Advantages for downstream (scikit-learn in our case, scikit-survival in your case)
Advantages for upstream (numpy or scipy in our case, scikit-learn in your case)
Advantages for the longer-term health of the ecosystem
Feel-good stories about this kind of cross-projects interactions (optional reading 😅)Here are a few of these interactions I managed to find (mostly because they happened not so long ago) and that are part of my personal feel-good open-source stories:
|
Thanks for your feedback.
Testing against nightly builds of scikit-learn could actually be quite
useful, because typically each new release of scikit-learn requires some
changes applied to scikit-survival too.
|
Describe the bug
While porting scikit-survival to support scikit-learn 1.6, I noticed that one test failed due to trees in a random forest having a different structure (see this GitHub Actions log).
Using git bisect, I could determine that #29458 is the culprit.
The PR imports
log
fromlibc.math
:scikit-learn/sklearn/tree/_partitioner.pyx
Line 14 in 23c1965
Previously,
log
was imported from._utils
:scikit-learn/sklearn/tree/_splitter.pyx
Line 10 in 215be2e
which actually implements
log2
:scikit-learn/sklearn/tree/_utils.pyx
Lines 65 to 66 in 215be2e
Replacing
from libc.math cimport isnan, log
with
fixes the problem.
Steps/Code to Reproduce
Expected Results
Actual Results
Versions
The text was updated successfully, but these errors were encountered: