L04 Decision Trees
L04 Decision Trees
1
Decision Tree - Introduction
Source: https://medium.datadriveninvestor.com/the-basics-of-decision-trees-e5837cc2aba7
Terminology
+Root node : is the first node in decision trees
+Splitting : is a process of dividing node into two or more sub-nodes,
starting from the root node
+Node : splitting results from the root node into sub-nodes and splitting
sub-nodes into further sub-nodes
+Leaf or terminal node : end of a node, since node cannot be split
anymore
+Pruning : is a technique to reduce the size of the decision tree by
removing sub-nodes of the decision tree. The aim is reducing
complexity for improved predictive accuracy and to avoid overfitting
+Branch / Sub-Tree : A subsection of the entire tree is called branch or
sub-tree.
+
Intuition
Year<4.5
Root node Branch
• Note:
Terminal node
• Hitters data
or leaf
• Salary in mean log
6.74 6.00 • Hits is number of hits the
player made in the previous
Splitting in Decision Trees
+split the nodes at the most informative features
using the decision algorithm
+start at the tree root and split the data on the
feature that results in the largest information
gain (IG)
+the objective function is to maximize the
information gain (IG) at each split
Splitting in Decision Trees
• Select a
feature and
split data
into binary
tree
• Continue
splitting
with Until:
available • Leaf node(s) are pure (only
features one class remains)
• A maximum depth is reached
• A performance metric is
achieved
Splitting in Decision Trees
+( (1)
+- feature to perform the split
+- data set of the parent
+- -th child node
+ – impurity measure
+- total number of samples at the parent node
+- number of samples in the -th child node
Splitting in Decision Trees
+information gain is simply the difference between the impurity of
the parent node and the sum of the child node impurities
+the lower the impurity of the child nodes, the larger the
information gain
+for simplicity and to reduce the combinatorial search space, most
libraries (including scikit-learn) implement binary decision trees -
each parent node is split into two child nodes, D-left and D-right
+( (2)
+3 common impurity measures or splitting criteria used in binary
decision trees
+Gini impurity (IG), entropy (IH), and misclassification error (IE)
Sample Dataset
+Simple simulation with Heart Disease Data set with 303 rows and
has 13 attributes. Target consist 138 value 0 and 165 value 1
• Variance is the variability of model prediction for a given data point or a value which tells us
spread of our data
• Model with high variance pays a lot of attention to training data and does not generalize on the
data which it hasn’t seen before
• such models perform very well on training data but has high error rates on test data
Pruning Decision Trees