Random Forest

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

History of Decision Trees

• The first regression tree algorithm

 “Automatic Interaction Detection (AID)” [Morgan & Sonquist, 1963]
• The first classification tree algorithm
 “Theta Automatic Interaction Detection (THAID)” [Messenger & Mandel,
• Decision trees become popular
 “Classification and regression trees (CART)” [Breiman et al., 1984]
• Introduction of the ID3 algorithm
 “Induction of Decision Trees” [Quinlan, 1986]
• Introduction of the C4.5 algorithm
 “C4.5: Programs for Machine Learning” [Quinlan, 1993]
Random Forests
(Ensemble learning with decision trees)

• Random Forests are an ensemble learning method that employ

decision tree learning to build multiple trees through bagging and
random subspace method.
Ensemble Learning
• Ensemble Learning:
 Method that combines multiple learning algorithms to obtain
performance improvements over its components
• Random Forests are one of the most common examples of ensemble
• Other commonly-used ensemble methods:
 Bagging: multiple models on random subsets of data samples
 Random Subspace Method: multiple models on random subsets of
 Boosting: train models iteratively, while making the current model
focus on the mistakes of the previous ones by increasing the weight
of misclassified samples
Random Forests
• Random Forests:
 Instead of building a single decision tree and use it to make predictions,
build many slightly different trees and combine their predictions
• We have a single data set, so how do we obtain slightly different trees?
1. Bagging (Bootstrap Aggregating):
 Take random subsets of data points from the training set to create N smaller data
 Fit a decision tree on each subset

2. Random Subspace Method (also known as Feature Bagging):

 Fit N different decision trees by constraining each one to operate on a random
subset of features
Bagging at training time
N subsets (with

Training set
Bagging at inference time

A test sample

75% confidence
Random Subspace Method at training time

Training data
Random Subspace Method at inference time

A test sample

66% confidence
Random Forests

Tree 1 Tree 2
Random Forest Tree N
Random Forest is capable of performing both Classification
and Regression tasks.
It is capable of handling large datasets with high
It enhances the accuracy of the model and prevents the over
fitting issue.
History of Random Forests

• Introduction of the Random Subspace Method

 “Random Decision Forests” [Ho, 1995] and “The Random Subspace
Method for Constructing Decision Forests” [Ho, 1998]

• Combined the Random Subspace Method with Bagging. Introduce the

term Random Forest (a trademark of Leo Breiman and Adele Cutler)
 “Random Forests” [Breiman, 2001]
All samples have
the same weight
All samples have
the same weight

Reweight based on
model’s mistakes

Next model sees

weighted samples

Reweight based
on current
model’s mistakes
• Ensemble Learning methods combine multiple learning algorithms to
obtain performance improvements over its components
• Commonly-used ensemble methods:
 Bagging (multiple models on random subsets of data samples)
 Random Subspace Method (multiple models on random subsets of
 Boosting (train models iteratively, while making the current model
focus on the mistakes of the previous ones by increasing the weight
of misclassified samples)
• Random Forests are an ensemble learning method that employ decision
tree learning to build multiple trees through bagging and random
subspace method.
 They rectify the overfitting problem of decision trees!

You might also like