Trees, Boosting, and Random Forest
Trees, Boosting, and Random Forest
Trees, Boosting, and Random Forest
Kevin Song
Pros:
I High interpretability, and not a black box (e.g., Exactly why
was I declined my loan?).
I To arrive at the models conclusion, just trace down the
branches of the tree.
I Fast to train, not computationally intensive for large datasets
(unlike neural networks, for instance).
I Can accept either categorical or quantitative variable inputs
(unlike neural networks, which only accept numerical data).
I Can be used with sparse data with missing values.
I Feature selection is built into the model.
Figure: Top row: true linear boundary; Bottom row: true non-linear
boundary. Left column: linear model; Right column: tree-based model
Adapted from Profs. Rob Tibshirani and Trevor Hastie
How to improve decision tree algorithms?
Using ensembles of trees