5.
7 TYPES OF DECISION TREE ALGORITHMS
Generally. there are three types of decision tree algorithms as:
1. lterative Dichotomizer 3(D3)Algorithm
2. CD 4.5 Algorithm
3. CART Algorithm
Types of Decision Tree Algorith ms
ID3 CD4.5 Classification and
Algorithm Algorithm Regression Tree (CART)
Algorithm
Fig. 5.5. Types of decision tree algorithms
5.8 A GENERAL DECISION TREE ALGORITHM STEPS
1. Calculate the Entropy (E) of every attribute (A) of data set (S).
2. Split (partition) the data set (S) into subsets using the attribute for which
the resulting entropy after splitting is minimized (or information gain is
maximized).
3. Make adecision tree node containing that attribute.
4. Repeat steps 1, 2 and 3 until the data set is finished.
99
D E C I S I O N
5.9
ITERATIVE DICHOTOMIZER 3 (ID3) ALGORITHM
In decision tree, Iterative Dichotomizer 3 (ID3) algorithm was developed by
Mr. Ross Quinlan. It is used to generate a decision tree from a given data set.
5.9.1 Pseudocode of ID-3 Decision Tree Algorithm
ID3 (Examples, Target-Attribute, Attributes)
Create a root node for the tree
, f allexamples are positive, return the single-node tree root, with label =(+).
IË allexamples are negative, return the single-node tree root with label = (-).
Ifnumber of predictingattributes is empty, then return the single node treeroot
with label = (most common value of target attribute in the examples).
Otherwise begin
A+ The attribute that best classifies examples.
Decision tree attribute for root = A.
For each possible value (u) of A,
Add a new tree branch below root, corresponding to the test A = (u).
Let examples (u) be the subset of examples that have the value (u) for A.
If examples (u) is empty:
Then below this new branch, add a leaf node with label = (most common target
talue in the examples)
Else below this new branch, add the subtree ID3 (Example (u), Target Attribute,
Attribute-(A)
End
Return Root
5.9.2 Limitations of ID-3 Decision Tree Algorithm
1. ID-3 does not guarantee an optimal solution.
2. ID-3 can overfit the training data.
3. ID-3 is harder touse on continuous data as compared to descrete data.
Lxample 5.4. Make a decision tree from the given training data of Table 5.1.
Table 5.1. Training examples (data) for target concept "Play Tennis"
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
Normal Weak Yes
Rain Cool
(Contd..)
No
D6 Rain Cool Normal Strong
Yes
D7 Overcast Cool Normal Strong
No
Weak
D8 Sunny Mild High
Yes
Weak
Cool Normal
D9 Sunny Yes
Strong
Normal
D10 Rain Mild
Yes
Strong
Normal
D11 Sunny Mild
Strong
Yes
Overcast Mild High
D12 Yes
Weak
Normal
Overcast Hot
D13
Strong No
Mild High
D14 Rain
Table 5.1.
i.e. (+ve) and 5(-ve) examples in this
No"
Here, nine "Yes'",five
(S) as shown in Table 5.1. ltconsists of 14 traininn
Solution. Let us consider a
data set -).
and 5 (-ve) examples denoted by (9 +, 5
examples
examples in which 9 (+ve)
Set Collection
" Entropy of Entire Data
C
...5.6
Entropy (S) = - P; . log2 P;
i=1
and -ve examples
p,= Probability or proportion of +ve
losa log2
Entropy(9+,5-=- 14
Entropy (9 +, 5 -)=0.940
" Root Node Selection
4Attributes = Outlook, Temp.. Humidity, Wind
1 Target = Play Tennis
as root
The attribute which gives highest information gain (IG) is selected
" Attribute 1= Qutlook
Values (outlook) = Sunny, Overcast, Rain overcast and S
We will calculate the entropy (S) of each value .e., Ssunny" hav
exxamples
(i) Entropy of Sunny (Sunn): Count number of training examplest
"Sunny' in the Table 5.1.There are 2 +ve and 3-vetrain1ng
consists "Sunny" in the column of "Outlook" attribute.
SPsunny (2+, 3-)
DECISION TREE LEARNING
101
Entropy (Ssunny)=(Proportion of tve Examples) log, (Proportion of tve Examples)
- (Proportion of - ve Examples)
.log, (Proportion of -ve Bxamples) ...(5.7)
2 2 3
5 5
log 5
Entropy (S,nn)= 0.971|
(ii)Entropy of Overcast: SPovercast (4+, 0-)
Entropy (Sovereast)= ()(9J()-0
(iii) Entropy of Rain: S,i, + (3+, 2-)
Entropy (Sai)= = 0.971.
" Information Gain ofOutlook: (w.r.t. entire dataset S)
Outlook )= Entropy of all data set (S)
S,I Entropy (S,)
Gain Entire Attribute vE (Sunny |SI
Dataset Overcast, Rain)
...(5.8)
S. = Number of times any attribute value is appearing e.g.,
Sunny is appearing 5 times
Overcast is appearing 4 times
Rain is appearing 5 times.
Gain (S, Outlook) =Entropy (S) - 14 Entropy (Ssunny
-(6) Entropy (Sovercast
5
Entropy (Sin
Gain (S, Outlook) =0.94 )os71-()o-) 0.971
Gain (S, Outlook) = 0.2464
the information gain (JG) of outlook attribute i.e., 0.2464.
This is
the information gain of remaining 3 attributes in Table 5.1
Similarly, we calculate
L.e., "Temp., Humidity and Wind.
" Attribute 2 = Temperature
Value (Temp.) = Hot, Mild,
Cool
S= (9+,5)
9 9 5 5
log2 =0.94
Entropy (S) = j4
lo82 14 14 14
Slo (2+, 2-)
2
2_2
Entropy (SHo)=-4 log? 4 4
2
log. 4 =1
SMild = (4+, 2-)
4log
Entropy (SMl)=-log
4 2
log, 22 =0.9183
6 6 6 6
SConl =(3+, 1-)
3 3 1 1
Entropy (Sco)=-log, 4 4
log, 4
= 0.8113
4
M
Information Gain (S, Temp) = Entropy(S) Entropy (S,) . 53
(Hot,Cool, Mild)
4
Information Gain (S, Temp.) = Entropy (S) 14
Entropy (SHo)
6 4
14 Entropy(Mia) 14 Entropy(,w
6
I.G. = 0.94
= 0.0289
H*1-4 x0.9183 14
x0.813
Similarly, we obtain the Information Gain of remaining two attributes as:
I. Gain (S. Outlook) = 0.2464 Highest IG
I. Gain (S, Temp.) = 0.0289
I. Gain (S, Humidity) =0. 1516
I. Gain (S wind) =0.0478
Root node will be outlook.
(D,, D, ..., D,a)
(9+,5-)
Outlook
Overcast Rain
Sunny
(D,, D,, D, D,, D,) (D,, D,, D,2, Dis) (D4, D,. Ds. Dgo, D,.)
(2+, 3-) (4+, 0-) (3+, 2-)
Yes
stage
Fig. 5.6. Decision tree at middle
Similarly, we can draw the next nodes below the root nodes. The final decision
tree
be as shown below(Fig.
will 5.7).
(9+,5-)
Outlook
Sunny Overcast Rain
(or Cloudy)
Humidity Wind
(D,, D,, D,, D,)
(4+, 0-)
High Normal Yes Strong Weak
D,,D,. D,) (D. D,,) (D6, Dya) (D, Ds, D,0)
No Yes No Yes
Fig.5.7. Final decision tree using ID3 algorithm
5.10 INDUCTIVE BIAS IN DECISION TREE LEARNING (LEARNING BIAS)
1. The inductive bias (or learning bias) of a machine learning algorithm is the set
of assumption that the learner uses to predict outputs of given inputs that it has
not encountered.
2. An approximation of inductive bias of ID3 decision tree algorithm:
"Shorter trees are preferred over longer trees. Trees that place high information
gain attributes close to root arepreferred over those that donot."
3. lnductive bias is a "policy" by which the decision tree algorithm (|D3) generalizes
Irom observed training examples to classify unseen instances.
3. Inductive bias is essential requirement in machine learning. With inductive bias.
a learner algorithm can easily examples generalize new unseen.
d Bias: The assumptions made by a model to make a function easier to learn are
called bias.
6. Variance: If you get very less error in training and very high error in testing of
data, then this difference is called «variance' in M/C learning and testing.