[MRG+1] Trees as feature transformers #5037

betatim · 2015-07-27T14:55:16Z

This is an example using ensembles of trees to show how to use them to transform your samples into a high dimensional, sparse feature space and then train a linear model on that. In particular how to use the apply method of a DecisionTree. It came about from a discussion in #4488 and #4549. It is loosely based on @ogrisel's notebook here: http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/sklearn_demos/Income%20classification.ipynb

In #4549 there is talk of showing the difference between a linear model and PCA. Not quite sure I got what you meant.

This could interest @amueller, @ogrisel and @vene.

vene · 2015-07-28T01:34:59Z

examples/ensemble/plot_feature_transformation.py

+from sklearn.cross_validation import train_test_split
+from sklearn.metrics import roc_curve
+
+Nest = 10


Could you rename this n_estimators?

betatim · 2015-07-28T07:23:58Z

Content related: do you think as the example is now people can understand what it is meant to show (if they didn't already know)? I was thinking of putting a comment by the gradient boosting part to say "this is the interesting bit, using a tree's apply method".

Build related:
Any ideas why a lot of the gaussian process tests are failing in the py3.4 build?

In a previous build a test in test_ridge.py failed (https://ci.appveyor.com/project/sklearn-ci/scikit-learn/build/1.0.1287/job/wdqjboo3dvnw6lk4) which looks like a Heisenbug?

Should I file an issue for those or are they known to be strange?

amueller · 2015-07-28T20:17:10Z

I think what I meant in #4549 was show a PCA on these vs on the original data? I'm not entirely following my wording, though.

amueller · 2015-07-28T20:17:46Z

examples/ensemble/plot_feature_transformation.py

@@ -0,0 +1,103 @@
+"""Use trees to transform your features


You need to have a title like in the other plot examples which will be used in the example gallery.

amueller · 2015-07-28T20:18:47Z

Thanks for working on this!

Can you post the plot here?
Also, it would be good to reference this in the user guide.

betatim · 2015-07-28T20:51:35Z

Undecided if zooming in on the top left (interesting) part is more helpful or confusing for others.

edit: saving the image while the figure is still open helps ...

amueller · 2015-07-28T20:52:32Z

the image looks white to me...

betatim · 2015-07-28T21:54:28Z

Updated the plot, and added a reference in the ensemble section where RandomTreesEmbedding is discussed. I looked in "Dataset transformations" but couldn't find an existing topic where this would fit there. Somehow expected to see tree based transformations there as well.

If you can't quite remember the PCA comment, and I can't work it out either should we skip it/make a second example?

amueller · 2015-07-28T21:59:04Z

looks good. Maybe it would be interesting to compare training the lr on the same training set vs a hold-out set? or at least mention that?

betatim · 2015-07-29T19:13:28Z

You mean like this (just showing the RF part here but I changed it for all of the models):

X, y = make_classification(n_samples=80000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
X_train, X_train_lr, y_train, y_train_lr = train_test_split(X, y, test_size=0.5)

...
# Supervised transformation based on random forests
rf = RandomForestClassifier(max_depth=3, n_estimators=n_estimator)
rf_enc = OneHotEncoder()
rf_lm = LogisticRegression()
rf.fit(X_train, y_train)
rf_enc.fit(rf.apply(X_train))
rf_lm.fit(rf_enc.transform(rf.apply(X_train_lr)), y_train_lr)

...

Hard to tell if it makes a difference. What is the idea behind using a different dataset for fitting the lr instead of using the same as for fitting the trees? We can change it or add it, but you'll have to provide the sentence to motivate it as I don't know enough ;)

amueller · 2015-07-29T19:53:03Z

well the idea would be to be closer to "stacking". Intuitively training both on the same set should lead to crazy overfitting. but maybe not. let's keep it simple.

betatim · 2015-07-30T12:04:16Z

Ok. I looked at 8.8 of ESL but I'll have to read it a few more times before it sinks in. Changed the example to use different subsets for fitting of the trees and the LR model.

amueller · 2015-07-30T16:08:12Z

Well, I'm not entirely sure what the "industry standard" is.

glouppe · 2015-07-31T07:58:35Z

examples/ensemble/plot_feature_transformation.py

+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
+# It is important to train the ensemble of trees on a different subset
+# of the training data than the linear regression model to avoid overfitting
+X_train, X_train_lr, y_train, y_train_lr = train_test_split(X, y, test_size=0.5)


shouldnt you do train_test_split(X_train, y_train, test_size=0.5)?

betatim · 2015-07-31T08:25:26Z

Updated ROC curves. After discussing with @glouppe this morning I think it makes sense to split the training samples again. Updated the comment a bit to explain why/when this is important.

ndawe · 2015-07-31T11:43:07Z

examples/ensemble/plot_feature_transformation.py

+
+Each sample goes through the decisions of each tree of the ensemble
+and ends up in one leaf per tree. The sample is encoded by setting
+feature values for these leafs to 1 and the other feature values to 0.


leafs -> leaves (?)

ndawe · 2015-07-31T12:14:43Z

Showing inverse false positive rate on a log scale vs true positive rate will "enhance" the difference:

Also changed n_samples to 10k, n_estimators to 20, and random seed to 10.

plt.plot(tpr_rt_lm, 1 / fpr_rt_lm, label='RT + LR')
plt.plot(tpr_rf, 1 / fpr_rf, label='RF')
plt.plot(tpr_rf_lm, 1 / fpr_rf_lm, label='RF + LR')
plt.plot(tpr_grd, 1 / fpr_grd, label='GBT')
plt.plot(tpr_grd_lm, 1 / fpr_grd_lm, label='GBT + LR')
plt.yscale('log')
plt.ylabel('Inverse false positive rate')
plt.xlabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

amueller · 2015-07-31T15:16:02Z

With "enhanced" differences I would be a bit afraid of sending a message like "this is always better than that" when this is a very zoomed-in view of a single run on a particular synthetic dataset.

amueller · 2015-07-31T15:17:05Z

(and currently it compares against a random forest using only part of the data, right?)

betatim · 2015-08-04T20:16:03Z

I have a slight preference for the plain ROC curve. The original intent of the example was to show how to use the individual tree's apply method. (Though @glouppe was surprised that GradientBoostedClassifier does not have a public apply)

Right now it compares RandomTreesEmbedding, RandomForestClassifier and GradientBoostedClassifier. Each of these is used as input for a LogisticRegression which is then used to classify. For the last two it also shows the performance without the LR step. The trees and LR are trained on different subsets of the data. For the comparison without the LR we could retrain the tree ensembles on the whole training set...though that makes it feel like a lesson on whether or not you should use this technique, which was not the original intent.

This example trains several tree based ensemble methods and uses them to transform the data into a high dimensional, sparse space. The trains a linear model on this new feature space. The idea is taken from: Practical Lessons from Predicting Clicks on Ads at Facebook Junfeng Pan, He Xinran, Ou Jin, Tianbing XU, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quiñonero Candela International Workshop on Data Mining for Online Advertising (ADKDD) https://www.facebook.com/publications/329190253909587/

larsmans · 2015-08-14T08:51:08Z

Shouldn't this at least link to the RandomTreesEmbedding docs?

amueller · 2015-08-14T17:33:35Z

Yeah, probably. I don't think we usually link from examples to docs. You can always click on the class to get to the api doc.

larsmans · 2015-08-14T17:35:07Z

By link I meant at least mention it :) This seems to be doing the same thing that estimator does, except with more control over the type of trees.

betatim · 2015-08-14T18:05:44Z

It is so similar that we even use RandomTreesEmbedding in the example (RT+LR in the legend) :) What do you think of this modification?

glouppe · 2015-08-30T07:44:27Z

I think the example is good enough to be merged. It is a nice demonstration of the apply method of tree-based methods. +1

[MRG+1] Trees as feature transformers

amueller · 2015-08-31T15:43:50Z

Thanks!

betatim · 2015-08-31T21:12:03Z

Thanks! Now back to walking in the swiss alps: 🗻

vene reviewed Jul 28, 2015
View reviewed changes

amueller reviewed Jul 28, 2015
View reviewed changes

glouppe reviewed Jul 31, 2015
View reviewed changes

ndawe reviewed Jul 31, 2015
View reviewed changes

betatim changed the title ~~Trees as feature transformers~~ [MRG] Trees as feature transformers Aug 9, 2015

Explicitly mention RandomTreesEmbedding in the text

ea8d092

amueller mentioned this pull request Aug 28, 2015

Support for real-valued features pystruct/pystruct#157

Closed

glouppe changed the title ~~[MRG] Trees as feature transformers~~ [MRG+1] Trees as feature transformers Aug 30, 2015

amueller added a commit that referenced this pull request Aug 31, 2015

Merge pull request #5037 from betatim/tree-feature-transform

96c329f

[MRG+1] Trees as feature transformers

amueller merged commit 96c329f into scikit-learn:master Aug 31, 2015

amueller mentioned this pull request Aug 31, 2015

Add cool example on "apply" of trees. #4549

Closed

betatim deleted the tree-feature-transform branch August 31, 2015 21:10

Uh oh!

[MRG+1] Trees as feature transformers #5037

[MRG+1] Trees as feature transformers #5037

Uh oh!

Conversation

betatim commented Jul 27, 2015

Uh oh!

vene Jul 28, 2015

Choose a reason for hiding this comment

Uh oh!

betatim commented Jul 28, 2015

Uh oh!

amueller commented Jul 28, 2015

Uh oh!

amueller Jul 28, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Jul 28, 2015

Uh oh!

betatim commented Jul 28, 2015

Uh oh!

amueller commented Jul 28, 2015

Uh oh!

betatim commented Jul 28, 2015

Uh oh!

amueller commented Jul 28, 2015

Uh oh!

betatim commented Jul 29, 2015

Uh oh!

amueller commented Jul 29, 2015

Uh oh!

betatim commented Jul 30, 2015

Uh oh!

amueller commented Jul 30, 2015

Uh oh!

glouppe Jul 31, 2015

Choose a reason for hiding this comment

Uh oh!

betatim commented Jul 31, 2015

Uh oh!

ndawe Jul 31, 2015

Choose a reason for hiding this comment

Uh oh!

ndawe commented Jul 31, 2015

Uh oh!

amueller commented Jul 31, 2015

Uh oh!

amueller commented Jul 31, 2015

Uh oh!

betatim commented Aug 4, 2015

Uh oh!

larsmans commented Aug 14, 2015

Uh oh!

amueller commented Aug 14, 2015

Uh oh!

larsmans commented Aug 14, 2015

Uh oh!

betatim commented Aug 14, 2015

Uh oh!

glouppe commented Aug 30, 2015

Uh oh!

amueller commented Aug 31, 2015

Uh oh!

betatim commented Aug 31, 2015

Uh oh!

Uh oh!