Skip to content

Use float32_t for tree.threshold #27535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lorentzenchr opened this issue Oct 5, 2023 · 5 comments
Open

Use float32_t for tree.threshold #27535

lorentzenchr opened this issue Oct 5, 2023 · 5 comments
Labels
Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. cython module:tree Needs Decision Requires decision Performance

Comments

@lorentzenchr
Copy link
Member

lorentzenchr commented Oct 5, 2023

The features X in our standard decision trees are float32, so it would make sense for the threshold of features to also be float32, see

DOUBLE_t threshold # Threshold value at the node

Note that the Cython trees are expose in our trees, e.g. DecisionTreeRegressor attribute tree_.

@adam2392
Copy link
Member

adam2392 commented Oct 26, 2023

This makes sense to me and would help reduce the memory/disc-storage footprint of the tree.

As a heavy-user of the tree submodule, I don't see an issue in backwards compatibility as even if I saved a tree from earlier sklearn versions, when traversing the tree with data, one is comparing float64 vs float32 anyways and any differences would be entirely due to machine precision.

The only issue I can see is if a user explicitly uses the tree_.threshold property directly.

@lorentzenchr
Copy link
Member Author

The only issue I can see is if a user explicitly uses the tree_.threshold property directly.

I would say even then one gets the right answer with a float32, it's just less precise. But in truth a float64 has just more precision than the threshold is accurate.

@adam2392
Copy link
Member

adam2392 commented Nov 1, 2023

I meant that if they use the threshold directly outside sklearn code for whatever reason and it involves comparing to something with float64 precision then changing 'threshold' to float32 would break their workflow.

I agree that float32 comparison in general is less precision but as your issue raised 'threshold' is mainly used in sklearn code to compare to X which is converted to float32 anyways.

Maybe a path forward is just having a deprecation warning within the access of tree_.threshold and then in v1.5 actually change it to float32?

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Nov 1, 2023

float32 and float64 are pretty interoperable even in C on most operations, comparisons included.

@adam2392
Copy link
Member

adam2392 commented Nov 1, 2023

SG! I'm in favor of switching the threshold type to float32_t :)

@lorentzenchr lorentzenchr added Needs Decision Requires decision Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. and removed Needs Decision - Backward Compatibility labels Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. cython module:tree Needs Decision Requires decision Performance
Projects
None yet
Development

No branches or pull requests

2 participants