Understanding of Working of DECISION TREE CART Algorithm
Understanding of Working of DECISION TREE CART Algorithm
4 Applications
CONTENTS
5 Steps involved in building CART
6 Hyperparameters
CLASSIFICATION : REGRESSION:
CART
WHY CART?
Leo Breiman Friedman Olshen Stone
1984
Disease – Yes / No
Medical
Disease - Severity
Loan – Yes / No
Where Banking
Loan – Amount
Telecom
Pricing Strategies - Tariffs
C = number of columns
GINI IMPURITY
Pi = proportion of ith class label
Regressor
Criteria - (“squared_error”, “friedman_mse”, “absolute_error”, “poisson”)
Splitter – (“best”, “random”)
Max_depth – (adjusting depth of tree)
Min_sample_split – (min no.of samples required to split an internal node)
Min_sample_leaf – (min no.of samples required to be at a leaf node)
Max_leaf_nodes – (tree with max_leaf_nodes in best-first fashion )
Max_features – (no.of features to consider for the best split)
Classifier
Criteria - (“gini”, “entropy”, “log_loss”)
Splitter – (“best”, “random”)
Max_depth – (adjusting depth of tree)
Min_sample_split – (min no.of samples required to split an internal node)
Min_sample_leaf – (min no.of samples required to be at a leaf node)
Max_leaf_nodes – (tree with max_leaf_nodes in best-first fashion )
Max_features – (“auto”, “sqrt”, “log2”)
Advantages
More robust in the presence of outliers
Variable selection
Variable interaction modelling
Local effect modelling
Nonlinear relationship modelling
Missing values
Disadvantages
High variance and unstable if new data comes.