Random Forest
Random Forest
Random Forest
Training set
Bagging at inference time
A test sample
75% confidence
Random Subspace Method at training time
Training data
Random Subspace Method at inference time
A test sample
66% confidence
Random Forests
Tree 1 Tree 2
Random Forest Tree N
Advantages:
Random Forest is capable of performing both Classification
and Regression tasks.
It is capable of handling large datasets with high
dimensionality.
It enhances the accuracy of the model and prevents the over
fitting issue.
History of Random Forests
Reweight based on
model’s mistakes
Boosting
Reweight based
on current
model’s mistakes
Boosting
Boosting
Summary
• Ensemble Learning methods combine multiple learning algorithms to
obtain performance improvements over its components
• Commonly-used ensemble methods:
Bagging (multiple models on random subsets of data samples)
Random Subspace Method (multiple models on random subsets of
features)
Boosting (train models iteratively, while making the current model
focus on the mistakes of the previous ones by increasing the weight
of misclassified samples)
• Random Forests are an ensemble learning method that employ decision
tree learning to build multiple trees through bagging and random
subspace method.
They rectify the overfitting problem of decision trees!