0% found this document useful (0 votes)
6 views

Decision Tree-Using Entropy

Uploaded by

mehtabksidhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Decision Tree-Using Entropy

Uploaded by

mehtabksidhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Decision Tree based Learning

Example

5/13/2022
Decision tree representation (PlayTennis)

Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No

5/13/2022
Decision trees expressivity
 Decision trees represent a disjunction of conjunctions on
constraints on the value of attributes:
(Outlook = Sunny  Humidity = Normal) 
(Outlook = Overcast) 
(Outlook = Rain  Wind = Weak)

5/13/2022
Top-down induction of Decision Trees
 ID3 (Quinlan, 1986) is a basic algorithm used to create DT's
 Given a training set of examples, the algorithms for building DT
performs search in the space of decision trees.
 The construction of the tree is top-down. The algorithm is greedy.
 The fundamental question is “which attribute should be tested next?
Which attribute gives us more information?”
 Select the best attribute
 A descendent node is then created for each possible value of this
attribute and data set is partitioned according to this value.
 The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left

5/13/2022
Which attribute is the best classifier?

 A statistical property called information gain, measures how


well a given attribute separates the training examples
 Information gain uses the notion of entropy, commonly used in
information theory
 Information gain = expected reduction of entropy

5/13/2022
5/13/2022
Entropy in binary classification
 Entropy measures the impurity of a collection of examples. It
depends from the distribution of the random variable p.
 S is a collection of training examples
 p+ the proportion of positive examples in S
 p– the proportion of negative examples in S
Entropy (S)  – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0.94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
= 1/2 + 1/2 = 1 [log21/2 = – 1]
Note: the log of a number < 1 is negative, 0  p  1, 0  entropy  1
Entropy in general
 Entropy measures the amount of information in a random
variable
H(X) = – p+ log2 p+ – p– log2 p– X = {+, –}
for binary classification [two-valued random variable]
c c
H(X) = –  pi log2 pi =  pi log2 1/ pi X = {i, …, c}
i=1 i=1
for classification in c classes
Example: rolling a die with 8, equally probable, sides
8
H(X) = –  1/8 log2 1/8 = – log2 1/8 = log2 8 = 3
i=1
Information gain as entropy reduction
 Information gain is the expected reduction in entropy caused by
partitioning the examples on an attribute.
 The higher the information gain the more effective the attribute
in classifying training data.
 Expected reduction in entropy knowing A

Gain(S, A) = Entropy(S) −  |Sv|


Entropy(Sv)
v  Values(A) |S|
Values(A) possible values for A
Sv subset of S for which A has value v
Example: expected information gain
 Let
 Values(Wind) = {Weak, Strong}
 S = [9+, 5−]
 SWeak = [6+, 2−]
 SStrong = [3+, 3−]
 Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0.94 − 8/14  0.811 − 6/14  1.00
= 0.048
Which attribute is the best classifier?
First step: which attribute to test at the root?

 Which attribute should be tested at the root?


 Gain(S, Outlook) = 0.246
 Gain(S, Humidity) = 0.151
 Gain(S, Wind) = 0.048
 Gain(S, Temperature) = 0.029
 Outlook provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Outlook
 partition the training samples according to the value of Outlook
After first step
Second step
 Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970  3/5  0.0  2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970  2/5  1.0  3/5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970  2/5  0.0  2/5  1.0  1/5  0.0 = 0.570
 Humidity provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Humidity
 partition the training samples according to the value of Humidity
Second and third steps

{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No
Thanks

You might also like