Use float32_t for tree.threshold #27535

lorentzenchr · 2023-10-05T08:57:06Z

The features X in our standard decision trees are float32, so it would make sense for the threshold of features to also be float32, see

scikit-learn/sklearn/tree/_tree.pxd

Line 31 in 8ae5f18

DOUBLE_t threshold # Threshold value at the node

Note that the Cython trees are expose in our trees, e.g. DecisionTreeRegressor attribute tree_.

The text was updated successfully, but these errors were encountered:

adam2392 · 2023-10-26T00:49:43Z

This makes sense to me and would help reduce the memory/disc-storage footprint of the tree.

As a heavy-user of the tree submodule, I don't see an issue in backwards compatibility as even if I saved a tree from earlier sklearn versions, when traversing the tree with data, one is comparing float64 vs float32 anyways and any differences would be entirely due to machine precision.

The only issue I can see is if a user explicitly uses the tree_.threshold property directly.

lorentzenchr · 2023-11-01T15:45:31Z

The only issue I can see is if a user explicitly uses the tree_.threshold property directly.

I would say even then one gets the right answer with a float32, it's just less precise. But in truth a float64 has just more precision than the threshold is accurate.

adam2392 · 2023-11-01T15:55:38Z

I meant that if they use the threshold directly outside sklearn code for whatever reason and it involves comparing to something with float64 precision then changing 'threshold' to float32 would break their workflow.

I agree that float32 comparison in general is less precision but as your issue raised 'threshold' is mainly used in sklearn code to compare to X which is converted to float32 anyways.

Maybe a path forward is just having a deprecation warning within the access of tree_.threshold and then in v1.5 actually change it to float32?

lorentzenchr · 2023-11-01T16:03:26Z

float32 and float64 are pretty interoperable even in C on most operations, comparisons included.

adam2392 · 2023-11-01T17:15:30Z

SG! I'm in favor of switching the threshold type to float32_t :)

github-actions bot added the Needs Triage Issue requires triage label Oct 5, 2023

lorentzenchr added cython Needs Decision - Backward Compatibility and removed Needs Triage Issue requires triage labels Oct 5, 2023

KartikeyBartwal mentioned this issue Oct 5, 2023

tree_xpd threshold changed to float32 #27536

Closed

lorentzenchr added Performance module:tree labels Dec 13, 2023

adam2392 mentioned this issue Jan 19, 2024

Inconsistency in DecisionTreeClassifier Threshold Behavior #28175

Open

lorentzenchr added Needs Decision Requires decision Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. and removed Needs Decision - Backward Compatibility labels Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use float32_t for tree.threshold #27535

Use float32_t for tree.threshold #27535

lorentzenchr commented Oct 5, 2023 •

edited

Loading

adam2392 commented Oct 26, 2023 •

edited

Loading

lorentzenchr commented Nov 1, 2023

adam2392 commented Nov 1, 2023 •

edited

Loading

lorentzenchr commented Nov 1, 2023 •

edited

Loading

adam2392 commented Nov 1, 2023

Use float32_t for tree.threshold #27535

Use float32_t for tree.threshold #27535

Comments

lorentzenchr commented Oct 5, 2023 • edited Loading

adam2392 commented Oct 26, 2023 • edited Loading

lorentzenchr commented Nov 1, 2023

adam2392 commented Nov 1, 2023 • edited Loading

lorentzenchr commented Nov 1, 2023 • edited Loading

adam2392 commented Nov 1, 2023

lorentzenchr commented Oct 5, 2023 •

edited

Loading

adam2392 commented Oct 26, 2023 •

edited

Loading

adam2392 commented Nov 1, 2023 •

edited

Loading

lorentzenchr commented Nov 1, 2023 •

edited

Loading