Skip to content

MAINT Split Tree into a BaseTree and a Tree subclass to allow easier inheritance  #25119

@adam2392

Description

@adam2392

Summary

#24678 introduces a modularization of Criterion to allow different criterion to be used with the same classes.
#25101 introduces a modularization of Splitter to allow different types of of splits to be computed.

Now comes the time to also modularize the Tree class. A good Tree class should enable oblique splits, causal leaf nodes (i.e. leaf nodes set differently from split nodes), quantile trees (leaf nodes set differently from split nodes) and unsupervised trees. Note another feature of causal trees is 'honesty', which should be easier to add after this issue is resolved.

Proposed improvement

We will have the following improvements:

  1. Refactor tree._add_node() to set the split node and leaf node differently.
  2. Refactor to have a 'splitptr' for SplitRecord, which allows for generalizations of the SplitRecord.
  3. Separate Tree into generic and abstract base functions for BaseTree and specific supervised axis-aligned functions for Tree

Once the changes are made, one should verify:

  1. If tree submodule's Cython code still builds (i.e. make clean and then pip install --verbose --no-build-isolation --editable . should not error out)
  2. verify unit tests inside sklearn/tree all pass
  3. verify that the asv benchmarks do not show a performance regression.

asv continuous --verbose --split --bench RandomForest upstream/main <new_branch_name> and then for side-by-side comparison asv compare main <new_branch_name>

Reference

As discussed in #24577 , I wrote up a doc on proposed improvements to the tree submodule that would:

  1. make it easier for 3rd party packages to subclass existing sklearn tree code and
  2. make it easier for sklearn itself to make improvements to the tree code with many of the modern improvements to trees

cc: @jjerphan

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions