-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Use float32_t for tree.threshold #27535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This makes sense to me and would help reduce the memory/disc-storage footprint of the tree. As a heavy-user of the tree submodule, I don't see an issue in backwards compatibility as even if I saved a tree from earlier sklearn versions, when traversing the tree with data, one is comparing The only issue I can see is if a user explicitly uses the |
I would say even then one gets the right answer with a float32, it's just less precise. But in truth a float64 has just more precision than the threshold is accurate. |
I meant that if they use the threshold directly outside sklearn code for whatever reason and it involves comparing to something with float64 precision then changing 'threshold' to float32 would break their workflow. I agree that float32 comparison in general is less precision but as your issue raised 'threshold' is mainly used in sklearn code to compare to X which is converted to float32 anyways. Maybe a path forward is just having a deprecation warning within the access of tree_.threshold and then in v1.5 actually change it to float32? |
float32 and float64 are pretty interoperable even in C on most operations, comparisons included. |
SG! I'm in favor of switching the |
The features
X
in our standard decision trees are float32, so it would make sense for the threshold of features to also be float32, seescikit-learn/sklearn/tree/_tree.pxd
Line 31 in 8ae5f18
Note that the Cython trees are expose in our trees, e.g.
DecisionTreeRegressor
attributetree_
.The text was updated successfully, but these errors were encountered: