[MRG] Tree apply #4065

galv · 2015-01-08T03:27:13Z

apply is now a public method for all decision trees, for #3832.

Added docstring and example demonstrating two uses of apply(): Reducing number of classes to predict, and make a one-hot feature encoding.

There is currently no function to do the one-hot encoding automatically. Should we consider adding one? transform() is already used to select only the features with highest importance.

…ivate method.

amueller · 2015-01-08T03:49:12Z

The issue with the transform is one of the reasons I added the RandomTreeEmbedding. Having that for trees that are trained in a supervised way would be good, though.

amueller · 2015-01-08T03:50:10Z

Should we test something?

jnothman · 2015-01-08T03:52:32Z

Something is tested, but in a very awkward place.

amueller · 2015-01-08T03:53:29Z

I'm blind apparently. But you are right, that is not the right place.

galv · 2015-01-08T05:04:25Z

I was also hesitant about that inserted line of test code.

It was my understanding that the only need in making a public API was that the input matrix X be converted to the correct format. From reading the code, I felt this meant two things:

Convert 64-bit floating point data to 32-bit floating point data.
Ensure all sparse matrices are CSC matrices, or convert them if not.

So these tests are indeed bad in hindsight, as they simply check that calling the private method returns the same as calling the public method, instead of these conditions. I will look more thoroughly through the test file.

Though looking back I now realize that this misses the edge case where the indices of X are not 32-bit ints. (My understanding of what a correct input is was gleaned from _check_input() in _tree.pyx)

validation.check_array() does not check for this. I can add a manual check of this in apply(), in addition to validation.check_array(), at the cost of code duplication; otherwise, I would have to expose TreeBuilder's _check_input() in to the python code by changing its cdef declaration cpdef in _tree.pxd and _tree.pyx to do all the requisite checks.

I'm hesitant to do the latter since the ensemble methods build off the decision tree, and a cpdef declaration would add overhead. Not to mention __check_input() makes assumption which don't apply here, such as that a TreeBuilder would be present, and that there are y values to be estimated, so the code would look somewhat obscure.

Suggestions on which route to take?

glouppe · 2015-01-08T08:08:26Z

There is currently no function to do the one-hot encoding automatically. Should we consider adding one?

I would not do this automatically, but let the user decides what to do with the leaf ids instead.

glouppe · 2015-01-08T08:09:48Z

sklearn/tree/tests/test_tree.py

@@ -1137,6 +1137,8 @@ def check_explicit_sparse_zeros(tree, max_depth=3,
    Xs = (X_test, X_sparse_test)
    for X1, X2 in product(Xs, Xs):
        assert_array_almost_equal(s.tree_.apply(X1), d.tree_.apply(X2))
+        assert_array_almost_equal(s.apply(X1), d.apply(X2))
+        assert_array_almost_equal(s.apply(X1), s.tree_.apply(X1))


Please write instead an independent test to only check the correctness of apply.

Wait, the test is there, but it should be removed here, right?

amueller · 2015-01-15T21:59:23Z

Can you please rebase?

…into tree_apply

amueller · 2015-03-09T22:23:28Z

there is a test error (sorry for late reply).

amueller · 2015-04-01T20:39:19Z

sklearn/tree/tests/test_tree.py

@@ -1268,3 +1270,38 @@ def check_min_weight_leaf_split_level(name):
 def test_min_weight_leaf_split_level():
    for name in ALL_TREES:
        yield check_min_weight_leaf_split_level, name
+<<<<<<< HEAD


Merge error here.

glouppe · 2015-04-11T13:57:52Z

Fixed by #4488

galv added 5 commits January 7, 2015 13:03

Make apply method of trees public. Added test for concistency with pr…

cd6a1a6

…ivate method.

Added docstring

e8928c9

Added example demonstrating tree.apply

f2e9ec7

Added indentation to docstring

3777046

Removed cruft

5e7b51f

glouppe reviewed Jan 8, 2015
View reviewed changes

Added tests of apply() for valid and invalid inputs. Fixed style.

41e2aef

amueller changed the title ~~Tree apply~~ [MRG] Tree apply Jan 15, 2015

galv added 2 commits January 25, 2015 16:47

Fixed frivolous conflict with master

df20027

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

25afee2

…into tree_apply

amueller reviewed Apr 1, 2015
View reviewed changes

glouppe mentioned this pull request Apr 2, 2015

[MRG + 1] Public apply method for decision trees #4488

Merged

glouppe closed this Apr 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Tree apply #4065

[MRG] Tree apply #4065

galv commented Jan 8, 2015

amueller commented Jan 8, 2015

amueller commented Jan 8, 2015

jnothman commented Jan 8, 2015

amueller commented Jan 8, 2015

galv commented Jan 8, 2015

glouppe commented Jan 8, 2015

glouppe Jan 8, 2015

amueller Jan 15, 2015

amueller Jan 15, 2015

amueller commented Jan 15, 2015

amueller commented Mar 9, 2015

amueller Apr 1, 2015

glouppe commented Apr 11, 2015

[MRG] Tree apply #4065

[MRG] Tree apply #4065

Conversation

galv commented Jan 8, 2015

amueller commented Jan 8, 2015

amueller commented Jan 8, 2015

jnothman commented Jan 8, 2015

amueller commented Jan 8, 2015

galv commented Jan 8, 2015

glouppe commented Jan 8, 2015

glouppe Jan 8, 2015

Choose a reason for hiding this comment

amueller Jan 15, 2015

Choose a reason for hiding this comment

amueller Jan 15, 2015

Choose a reason for hiding this comment

amueller commented Jan 15, 2015

amueller commented Mar 9, 2015

amueller Apr 1, 2015

Choose a reason for hiding this comment

glouppe commented Apr 11, 2015