Apply method for trees #3832

amueller · 2014-11-06T15:46:09Z

I think it would be nice to add an apply method to the tree. Currently there is one in the RandomForest, but not in the tree. There is one in tree.tree_, but the tree object is not publicly documented. I think the idea was that we might want to change the structure of the tree object, so we don't make it public.
Still we could provide a public interface to the lower level functions so that people might find them more easily.

Opinions?

The text was updated successfully, but these errors were encountered:

ogrisel · 2014-11-21T10:06:44Z

+1 and for GB models as well.

arjoly · 2014-11-21T11:44:21Z

Why would it be useful to users? Do you have some applications (examples?) in mind?

ogrisel · 2014-11-21T12:04:47Z

Making it easier to do this kind of transform (see the end of the notebook):

http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/sklearn_demos/Income%20classification.ipynb

arjoly · 2014-11-21T12:41:29Z

If I made correctly the quick search in your document, what you want to have is a RandomTreesEmbedding which is not a totally randomized trees. I am +1 for this idea. :-)

This doesn't seem to be an example / application for this issue.

amueller · 2014-11-21T15:27:04Z

I think we should add an example :)

jnothman · 2014-11-22T12:24:27Z

Briefly looking at Olivier's notebook, I have seen Jerome Friedman speak of a similar model as "rule ensembles", wherein one uses randomised trees as a means of extracting feature combinations that can then be weighted with logistic regression et al. I do not recall the details of his 2008 paper (or 2005 tech report) on the topic, but from his presentation, I gathered these rules could be the path to any node (from root, or from any I can't recall), not only to the leaf. In any case, it is a little different from what's given above. I think it's a nice idea in terms of producing models that can be understood.

jnothman · 2014-11-22T12:25:44Z

But I would also be unsurprised if it's a technique with many parallel reinventions...

ogrisel · 2014-11-23T18:17:41Z

Yes @pprett also mentioned over twitter that rulefit leverages sub-paths starting from the root as categorical features instead just the leafs (full path from the root).

davidcieslak-zz · 2015-01-06T22:53:19Z

Sorry to jump in on this issue unannounced, but this issue seemed related to something I wanted to do. I'm hoping to apply sklearn to get predicted leaf node ids. I think I'm using the apply method -- like@ogrisel does in his notebook -- as follows:

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
iris = load_iris()
clf = DecisionTreeClassifier(random_state=0)
mdl = clf.fit(iris.data, iris.target)
mdl.tree_.apply(iris.data)

However, when I do that, I'm getting the following error:

    print mdl.tree_.apply(iris.data)
  File "_tree.pyx", line 2382, in sklearn.tree._tree.Tree.apply (sklearn/tree/_tree.c:19595)
ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'double'

I'm also getting the same error with a call to the first tree of a random forest:

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
mdl = clf.fit(iris.data, iris.target)
mdl.estimators_[0].tree_.apply(iris.data)

Do I need to re-type the data somehow in order to get this to work? Thanks in advance for any help!

jnothman · 2015-01-06T23:08:56Z

I think the development version should give a more specific error message,
along the lines of "X.dtype should be np.float32, got np.float64". But yes,
this is another reason not to require users to directly use Tree.apply

On 7 January 2015 at 09:53, davidcieslak notifications@github.com wrote:

Sorry to jump in on this issue unannounced, but this issue seemed
related to something I wanted to do. I'm hoping to apply sklearn to get
predicted leaf node ids. I think I'm using the apply method -- like@ogrisel
does in his notebook -- as follows:

from sklearn.tree import DecisionTreeClassifierfrom sklearn.datasets import load_iris
iris = load_iris()
clf = DecisionTreeClassifier(random_state=0)
mdl = clf.fit(iris.data, iris.target)
mdl.tree_.apply(iris.data)

However, when I do that, I'm getting the following error:
print mdl.tree_.apply(iris.data)
File "_tree.pyx", line 2382, in sklearn.tree._tree.Tree.apply (sklearn/tree/_tree.c:19595)
ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'double'

I'm also getting the same error with a call to the first tree of a random
forest:

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
mdl = clf.fit(iris.data, iris.target)
mdl.estimators_[0].tree_.apply(iris.data)

Do I need to re-type the data somehow in order to get this to work?
Thanks in advance for any help!

—
Reply to this email directly or view it on GitHub
#3832 (comment)
.

amueller · 2015-01-06T23:29:57Z

You can work around this by re-typing to float32.

davidcieslak-zz · 2015-01-06T23:36:22Z

Brilliant. Thanks all!

galv · 2015-01-08T03:32:54Z

For the record, it's true that this method has multiple reinventions. It's used in speech recognition for a slightly different purpose: to cluster the hidden markov models for triphones (For every hmm, there is a a triphone. It's a one-to-one correspondence) that are seen few or no times in the training data, such that hmms in the same cluster share parameters, so that more robust estimates of these parameters can be made.

glouppe · 2015-04-11T13:58:02Z

Fixed by #4488

amueller added Easy Well-defined and straightforward way to resolve Enhancement labels Nov 6, 2014

ogrisel mentioned this issue Nov 21, 2014

n_jobs support in GradientBoostingClassifier #3628

Closed

galv mentioned this issue Jan 8, 2015

[MRG] Tree apply #4065

Closed

glouppe mentioned this issue Apr 2, 2015

[MRG + 1] Public apply method for decision trees #4488

Merged

glouppe closed this as completed Apr 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply method for trees #3832

Apply method for trees #3832

amueller commented Nov 6, 2014

ogrisel commented Nov 21, 2014

arjoly commented Nov 21, 2014

ogrisel commented Nov 21, 2014

arjoly commented Nov 21, 2014

amueller commented Nov 21, 2014

jnothman commented Nov 22, 2014

jnothman commented Nov 22, 2014

ogrisel commented Nov 23, 2014

davidcieslak-zz commented Jan 6, 2015

jnothman commented Jan 6, 2015

amueller commented Jan 6, 2015

davidcieslak-zz commented Jan 6, 2015

galv commented Jan 8, 2015

glouppe commented Apr 11, 2015

Apply method for trees #3832

Apply method for trees #3832

Comments

amueller commented Nov 6, 2014

ogrisel commented Nov 21, 2014

arjoly commented Nov 21, 2014

ogrisel commented Nov 21, 2014

arjoly commented Nov 21, 2014

amueller commented Nov 21, 2014

jnothman commented Nov 22, 2014

jnothman commented Nov 22, 2014

ogrisel commented Nov 23, 2014

davidcieslak-zz commented Jan 6, 2015

jnothman commented Jan 6, 2015

amueller commented Jan 6, 2015

davidcieslak-zz commented Jan 6, 2015

galv commented Jan 8, 2015

glouppe commented Apr 11, 2015