[TEST PR] Adding oblique trees (i.e. Forest-RC) to cythonized tree module #11

adam2392 · 2021-09-04T00:56:19Z

Reference Issues/PRs

Fixes:

What does this implement/fix? Explain your changes.

Adds cythonized oblique trees to the tree module. This is known as Forest-RC in the Breiman 2001 paper.

_oblique_tree.pxd/pyx: This file implements i) the ObliqueTree(Tree), which defines a few additional class members for storing the projection weight and indices and a new function for adding and oblique node and then ii) the ObliqueTreeBuilder(TreeBuilder), which defines how to build the oblique tree.
_oblique_splitter.pxd/pyx: This is the main change, which i) defines an ObliqueSplitRecord for keeping track of oblique splits, and ii) defines an ObliqueSplitter(Splitter) which gets oblique node splits and samples projection matrices, while also storing additional hyperparameters.
_classes.py: Defines new Python interfaces for the Oblique trees and forests

Any other comments?

I'm not an expert in cython and c++ interplay, but I suspect that if we can "generalize" the Node struct to carry projection vector and weight information (not used in Forest-RI, or axis-aligned Random Forest), then much of the tree, tree building code is not even necessary. The only thing that is different at a fundamental level is the idea of a sample_proj_mat at each node of the tree, which samples sparse combinations.

Another missing component currently is the implementation on sparse data, but this should be easily added in I presume.

Code That Can Be Shortened

New data structures and classes, ObliqueSplitRecord, ObliqueSplitter, ObliqueTree are defined. However, if the existing SplitRecord, Splitter, Tree can be generalized, then the existing functions can just be used to build Oblique Trees too.

However, I'm not sure if the scikit-learn devs would want that, rather then just replicating some code across these two cases?

thomasjpfan

Thanks for the prototype, I think I see what is required to make it easier for you to extend scikit-learn's trees.

In the redesign, I think methods such as tree._apply_dense and DenseObliqueSplitter.node_split would still need to exist to perform your custom logic + data structures.

thomasjpfan · 2021-11-09T22:58:03Z

sklearn/tree/_oblique_tree.pyx

+                node_id = tree._add_oblique_node(parent, is_left, is_leaf, split.feature,
+                                         split.threshold, impurity, n_node_samples,
+                                         weighted_n_node_samples,
+                                         split.proj_vec_weights,
+                                         split.proj_vec_indices)


This is a different API than the scikit-learn. As you mentioned in office hours, to better support this use case, tree._add_node should accept a SpiltRecord and SplitRecord is a class that can be subclassed. WDYT?

Yes if SplitRecord can become a class that can be subclassed, then this becomes easier. However, I wasn't sure if the struct was there for performance, or maintenance reasons.

thomasjpfan · 2021-11-09T22:58:45Z

sklearn/tree/_oblique_tree.pyx

+                           (impurity <= min_impurity_decrease))
+
+                if not is_leaf:
+                    splitter.oblique_node_split(impurity, &split, &n_constant_features)


This is the same API as scikit-learn. As long as SplitRecord can be subclassed, I think this should just work with your custom splitter.

Yes, what I specced out at least initially is that almost all of the Tree class currently in scikit-learn can be used for any "new tree" that only changes the splitter function as long as SplitRecord and the type of splitter can be subclassed.

Since the tree doesn't do anything fancy except call Splitter functions.

…version compiles, but doesn't work due to the lack of modularity in the existing tree code

… obliquetrees

Create RF vs OF benchmarking notebook

adam2392 · 2022-03-10T16:09:11Z

Some notes as of 3/10/22

Have been working on a refactor that would make adding oblique trees to the existing codebase a lot simpler.

Some notes for further improvement:

To enable fusedtype member in a class, use tempita: Allow fused types in extension type attributes. cython/cython#3283 (comment)
To migrate the proj_vec_indices and weights to a struct ObliqueNode, requires converting a vector pointer to numpy array to get the DTYPE (see: https://stackoverflow.com/questions/58476007/creating-a-numpy-array-from-a-pointer-in-cython)

adam2392 mentioned this pull request Sep 4, 2021

Adding Oblique Trees (Forest-RC) to the Cythonized Tree Module scikit-learn/scikit-learn#20819

Open

adam2392 force-pushed the obliquetrees branch from 4addedb to 634a428 Compare September 7, 2021 14:51

adam2392 added 7 commits September 7, 2021 13:02

Precomit

d5d248c

Adjust file change

7e3ed8d

Make treebuilder inherit

f7f9502

Fix flake

741e327

Prehook

22706cd

Cleanup

0de6760

Merge tests

44f688c

adam2392 force-pushed the obliquetrees branch from 634a428 to 44f688c Compare September 7, 2021 17:02

adam2392 added 2 commits September 26, 2021 21:18

Merge branch 'main' into obliquetrees

603e743

Try stuff.

1844a17

thomasjpfan reviewed Nov 9, 2021

View reviewed changes

adam2392 and others added 17 commits December 15, 2021 21:30

Merging main

d8e288d

Merging main

94e02ad

Reverting work to match unmodular tree, which works though. Previous …

017747f

…version compiles, but doesn't work due to the lack of modularity in the existing tree code

Updated notebookg

f420507

Updated notebookg

33ffd6d

add requirements

4b3d286

remove requirements.txt

67c8614

fix typo to properly instantiate ObliqueRandomForestClassifier

6d9f9f8

add ObliqueRandomForestClassifier in __init__.py

6228cfb

Merge pull request #13 from neurodata/obliquetrees-jms

b1a51ae

add RF vs OF benchmarking notebook

b80368e

minor change in markdown description

9d026ee

Merge branch 'main' into obliquetrees

4f2f369

Merge branch 'obliquetrees' of github.com:neurodata/scikit-learn into…

ab23d99

… obliquetrees

Merge pull request #14 from jshinm/obliquetrees-jms

a22795a

Create RF vs OF benchmarking notebook

Adding comparison notebook

0b0b48f

Merge branch 'main' into obliquetrees

01dc065

adam2392 closed this Jun 13, 2023

adam2392 deleted the obliquetrees branch June 13, 2023 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TEST PR] Adding oblique trees (i.e. Forest-RC) to cythonized tree module #11

[TEST PR] Adding oblique trees (i.e. Forest-RC) to cythonized tree module #11

Uh oh!

adam2392 commented Sep 4, 2021 •

edited

Loading

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Nov 9, 2021

Uh oh!

adam2392 Nov 9, 2021

Uh oh!

thomasjpfan Nov 9, 2021

Uh oh!

adam2392 Nov 10, 2021

Uh oh!

adam2392 commented Mar 10, 2022

Uh oh!

Uh oh!

[TEST PR] Adding oblique trees (i.e. Forest-RC) to cythonized tree module #11

[TEST PR] Adding oblique trees (i.e. Forest-RC) to cythonized tree module #11

Uh oh!

Conversation

adam2392 commented Sep 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Code That Can Be Shortened

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Nov 9, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 Nov 10, 2021

Choose a reason for hiding this comment

Uh oh!

adam2392 commented Mar 10, 2022

Some notes as of 3/10/22

Uh oh!

Uh oh!

adam2392 commented Sep 4, 2021 •

edited

Loading