-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Apply method for trees #3832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 and for GB models as well. |
Why would it be useful to users? Do you have some applications (examples?) in mind? |
Making it easier to do this kind of transform (see the end of the notebook): |
If I made correctly the quick search in your document, what you want to have is a RandomTreesEmbedding which is not a totally randomized trees. I am +1 for this idea. :-) This doesn't seem to be an example / application for this issue. |
I think we should add an example :) |
Briefly looking at Olivier's notebook, I have seen Jerome Friedman speak of a similar model as "rule ensembles", wherein one uses randomised trees as a means of extracting feature combinations that can then be weighted with logistic regression et al. I do not recall the details of his 2008 paper (or 2005 tech report) on the topic, but from his presentation, I gathered these rules could be the path to any node (from root, or from any I can't recall), not only to the leaf. In any case, it is a little different from what's given above. I think it's a nice idea in terms of producing models that can be understood. |
But I would also be unsurprised if it's a technique with many parallel reinventions... |
Yes @pprett also mentioned over twitter that rulefit leverages sub-paths starting from the root as categorical features instead just the leafs (full path from the root). |
Sorry to jump in on this issue unannounced, but this issue seemed related to something I wanted to do. I'm hoping to apply sklearn to get predicted leaf node ids. I think I'm using the apply method -- like@ogrisel does in his notebook -- as follows: from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
iris = load_iris()
clf = DecisionTreeClassifier(random_state=0)
mdl = clf.fit(iris.data, iris.target)
mdl.tree_.apply(iris.data) However, when I do that, I'm getting the following error:
I'm also getting the same error with a call to the first tree of a random forest: from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
mdl = clf.fit(iris.data, iris.target)
mdl.estimators_[0].tree_.apply(iris.data) Do I need to re-type the data somehow in order to get this to work? Thanks in advance for any help! |
I think the development version should give a more specific error message, On 7 January 2015 at 09:53, davidcieslak notifications@github.com wrote:
|
You can work around this by re-typing to float32. |
Brilliant. Thanks all! |
For the record, it's true that this method has multiple reinventions. It's used in speech recognition for a slightly different purpose: to cluster the hidden markov models for triphones (For every hmm, there is a a triphone. It's a one-to-one correspondence) that are seen few or no times in the training data, such that hmms in the same cluster share parameters, so that more robust estimates of these parameters can be made. |
Fixed by #4488 |
I think it would be nice to add an
apply
method to the tree. Currently there is one in the RandomForest, but not in the tree. There is one intree.tree_
, but the tree object is not publicly documented. I think the idea was that we might want to change the structure of the tree object, so we don't make it public.Still we could provide a public interface to the lower level functions so that people might find them more easily.
Opinions?
The text was updated successfully, but these errors were encountered: