Decision Trees

Decision Trees
Important Terminology
• Root Node: It represents entire • Branch / Sub-Tree: A sub section of

population or sample and this further decision tree is called branch or
gets divided into two or more sub-tree.
homogeneous sets
• Splitting: It is a process of dividing a • Parent and Child Node: A node,
node into two or more sub-nodes which is divided into sub-nodes is
• Decision Node: When a sub-node splits called parent node of sub-nodes
into further sub-nodes, then it is called where as sub-nodes are the child of
decision node parent node.
• Leaf/ Terminal Node: Nodes with no
children (no further split) is called Leaf or
Terminal node
• Pruning: When we reduce the size of
decision trees by removing nodes
(opposite of Splitting), the process is
called pruning
ID3 - Entropy
• A decision tree is built top-down

from a root node and involves
partitioning the data into
subsets that contain instances
with similar values
(homogeneous).
• ID3 algorithm uses entropy to
calculate the homogeneity of a
sample.
• If the sample is completely
homogeneous the entropy is
zero and if the sample is equally
divided then it has entropy of
one.
ID3 - Entropy
• Entropy using the frequency

table of one attribute:
Entropy using the frequency table
of two attributes:
• Step 2:
• The dataset is then split on the
ID3 Steps different attributes.
• The entropy for each branch is
calculated. Then it is added
proportionally, to get total entropy for
the split.
• The information gain is based on • The resulting entropy is subtracted
the decrease in entropy after a from the entropy before the split.
data-set is split on an attribute. • The result is the Information Gain, or
decrease in entropy.
• Constructing a decision tree is all
about finding attribute that
returns the highest information
gain (i.e., the most homogeneous
branches).
• Step 1: Calculate entropy of the
target.
ID3 Steps
• Step 3: Choose attribute with

the largest information gain as
the decision node, divide the
dataset by its branches and
repeat the same process on
every branch.
ID3 Steps
• Step 4a: A branch with entropy of • Step 4b: A branch with entropy
0 is a leaf node. more than 0 needs further
splitting.
Step 5: The ID3 algorithm is run recursively on the non-leaf branches,

until all data is classified.
Example Revisited
• Consider weather dataset based on which we will determine whether

to play football or not.
First step - find the parent node for
our decision tree
• Find the entropy of class variable.
E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
• calculate average weighted entropy

E(S, outlook) = (5/14)*E(3,2) + (4/14)*E(4,0) + (5/14)*E(2,3) = (5/14)(-
(3/5)log(3/5)-(2/5)log(2/5))+ (4/14)(0) + (5/14)((2/5)log(2/5)-
(3/5)log(3/5)) = 0.693
• find the information gain [Difference between parent entropy and
average weighted entropy ] = IG(S, outlook) = 0.94 - 0.693 = 0.247
First step - find the parent node for
our decision tree
• Find Information gain for Temperature, Humidity and Windy
• IG(S, Temperature) = 0.940 - 0.911 = 0.029
• IG(S, Humidity) = 0.940 - 0.788 = 0.152
• IG(S, Windy) = 0.940 - 0.8932 = 0.048

Step 2: select the feature
having largest entropy gain
• Here it is Outlook. So it forms first node(root node) of our decision
tree.
overcast contains
only examples of
class ‘Yes’ we can
set it as yes
Next step : find the next node in
our decision tree
• find the node under sunny -- to determine which of the following
Temperature ,Humidity or Wind has higher information gain
Calculate parent entropy E(sunny)
E(sunny) = (-(3/5)log(3/5)-(2/5)log(2/5)) =
0.971.
Next Step : find the next node in our
decision tree
• Calculate information gain of Temperature. IG(sunny, Temperature)
E(sunny, Temperature) = (2/5)*E(0,2) + (2/5)*E(1,1) + (1/5)*E(1,0)=2/5=0.4
IG(sunny, Temperature) = 0.971–0.4 =0.571
Compute IG ,
IG(sunny, Humidity) =
IG(sunny, Wind) =
decision tree
• Calculate information gain of Temperature. IG(sunny, Temperature)
E(sunny, Temperature) = (2/5)*E(0,2) + (2/5)*E(1,1) + (1/5)*E(1,0)=2/5=0.4
IG(sunny, Temperature) = 0.971–0.4 =0.571
Compute IG ,
IG(sunny, Humidity) = 0.971
IG(sunny, Windy) = 0.020
decision tree
• IG(sunny, Humidity) is the largest value, So Humidity is the node
which comes under sunny.
Play will occur if humidity is

normal
and will not occur if it is high
Final Decision Tree
Example 2
• https://medium.com/@rakendd/building-decision-trees-and-its-mat
h-711862eea1c0
CART Vs ID3 Vs C4.5
Types of decision trees

Decision Trees

Uploaded by

Copyright:

Available Formats

Decision Trees

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision Trees

Uploaded by

Copyright:

Available Formats

Decision Trees

• Root Node: It represents entire • Branch / Sub-Tree: A sub section of

• A decision tree is built top-down

• Entropy using the frequency

• Step 3: Choose attribute with

Step 5: The ID3 algorithm is run recursively on the non-leaf branches,

• Consider weather dataset based on which we will determine whether

• calculate average weighted entropy

• IG(S, Temperature) = 0.940 - 0.911 = 0.029

• IG(S, Humidity) = 0.940 - 0.788 = 0.152

• IG(S, Windy) = 0.940 - 0.8932 = 0.048

Calculate parent entropy E(sunny)

E(sunny, Temperature) = (2/5)E(0,2) + (2/5)E(1,1) + (1/5)*E(1,0)=2/5=0.4

IG(sunny, Temperature) = 0.971–0.4 =0.571

E(sunny, Temperature) = (2/5)E(0,2) + (2/5)E(1,1) + (1/5)*E(1,0)=2/5=0.4

IG(sunny, Temperature) = 0.971–0.4 =0.571

Play will occur if humidity is

You might also like