-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 1] Public apply
method for decision trees
#4488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
apply
method for decision treesapply
method for decision trees
apply
method for decision treesapply
method for decision trees
Still have to look at the example... |
|
apply
method for decision treesapply
method for decision trees
I improved the example a bit, but I am not very happy with it at all in fact. I would vote for removing it. What do you think? Note that we already have an example showing the capabilities of trees for embedding. |
2e851cf
to
9d4bb37
Compare
I was looking at this yesterday and thought of rebasing it :) Thanks for picking it up. I'll have a look later. |
Do you have the picture genreated by the example? |
|
||
Note that in the below double bar graph demonstrating the first transformation, | ||
all setosas fall into the first leaf, and none into the second leaf. | ||
Similarly, all versicolors and virginicas fall only into the second leaf. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some double spaces in this and the previous line.
This is the plot that the example produces, it doesn't say much to me. I agree it's not the most convincing example. I think the "sparse" feature extraction point is not well made by using a tree with only two leaves. As for part 1 of the example, I am not convinced why I would want to transform the classification problem that I care about into one where I'd just predict the noisy leaves of a decision tree trained on the original problem. The only reason I can find is to do model compression, but why would I compress a tree? Maybe a whole forest. How about an example that focuses on the second part, but actually creates a |
Come to think of it, given the digits embedding example referenced by @glouppe, the only usefulness of a |
+1 on removing the example. The GradientBoostingClassifier doesn't have apply (as it is not necessarily tree-based) so we could use the tree apply. |
X_small32 = X_small.astype(tree._tree.DTYPE) | ||
|
||
est = ALL_TREES[name]() | ||
est.fit(X_small32, y_small) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use X_small here
Ah, this would be a great example indeed. GB not being necessarily tree-based is a great reason to provide this as an example rather than as a transformer in the library. In the old PR, @ogrisel pointed to a notebook where he used this: with a link to this paper: Practical Lessons from Predicting Clicks on Ads at Facebook Junfeng Pan, He Xinran, Ou Jin, Tianbing XU, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quiñonero Candela International Workshop on Data Mining for Online Advertising (ADKDD) |
hehe yeah Ralf Herbrich was my boss at Amazon, which is why I thought about it ;) |
X_trans = np.zeros((len(X), max_leaves)) | ||
|
||
for i in range(max_leaves): | ||
X_trans[:, i] = y_reduced == (i + 1) # Leaf indices begin at 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this is incorrect and would break with deeper trees given @amueller 's documentation observation below.
Sorry for the delay. I removed the example, improved tests and now raise NotFittedError instead of ValueError. |
apply
method for decision treesapply
method for decision trees
LGTM. whatsnew? |
Done :) |
Opened #4549 in case anyone is bored ;) |
A last review? Should be quick, there is not much. CC: @vene |
I improved the documentation. Merging. |
[MRG + 1] Public `apply` method for decision trees
This supersedes #4065. I rebased @galv branch on top of master and rewrote the tests.
Fixes ##3832.