2.3 Decision-Tree-Algorithm
2.3 Decision-Tree-Algorithm
2.3 Decision-Tree-Algorithm
Definition
• A decision tree is a classifier in the form of a
tree structure with two types of nodes:
– Decision node: Specifies a choice or test of
some attribute, with one branch for each
outcome
– Leaf node: Indicates classification of an example
– It is a recursive partitioning algorithm.
Decision Tree Example 1
Whether to approve a loan
Employed?
No Yes
Credit
Income?
Score?
High Low High Low
No Yes No Yes
Decision Tree for PlayTennis
Outlook
No Yes No Yes
Decision Tree
decision trees represent disjunctions of conjunctions
Outlook
No Yes No Yes
(Outlook=Sunny Humidity=Normal)
(Outlook=Overcast)
(Outlook=Rain Wind=Weak)
Searching for a good tree
• How should you go about building a decision tree?
• The space of decision trees is too big for systematic
search.
• Stop and
– return the a value for the target feature or
– a distribution over target feature values
17
Principled Criterion
• Selection of an attribute to test at each node -
choosing the most useful attribute for classifying
examples.
• information gain
– measures how well a given attribute separates the training
examples according to their target classification
– This measure is used to select among the candidate
attributes at each step while growing the tree
– Gain is measure of how much we can reduce
uncertainty (Value lies between 0,1)
Entropy
• A measure for
– uncertainty
– purity
– information content
• Information theory: optimal length code assigns (- log2p) bits
to message having probability p
• S is a sample of training examples
– p+ is the proportion of positive examples in S
– p- is the proportion of negative examples in S
• Entropy of S: average optimal number of bits to encode
information about certainty/uncertainty about S
Entropy(S) = p+(-log2p+) + p-(-log2p-) = -p+log2p+- p-log2p-
Entropy
Humidity Wind
Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
=0.151 =0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
Selecting the Next Attribute
S=[9+,5-]
E=0.940
Outlook
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
Selecting the Next Attribute
The information gain values for the 4 attributes are:
• Gain(S,Outlook) =0.247
• Gain(S,Humidity) =0.151
• Gain(S,Wind) =0.048
• Gain(S,Temperature) =0.029
Note: 0Log20 =0
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]
? Yes ?
Test for this node
No Yes No Yes
• entropy
. .
. .
. .
. .
red
Color? green
Entropy
reduction .
by yellow .
data set
partitioning .
.
. .
. .
. .
. .
red
Color? green
.
yellow .
.
.
. .
. .
.
Information Gain
.
. .
red
Color? green
.
yellow .
.
.
Information Gain of The Attribute
• Attributes
– Gain(Color) = 0.246
– Gain(Outline) = 0.151
– Gain(Dot) = 0.048
• Heuristics: attribute with the highest gain is chosen
• This heuristics is local (local minimization of impurity)
. .
. . .
.
. . red
Color?
green
. yellow
.
.
.
Color?
Gain(Outline) = 0.971 – 0.951 = 0.020 bits
Gain(Dot) = 0.971 – 0 = 0.971 bits
green
. yellow
.
.
.
solid
.
Outline?
dashed
.
. .
. . .
.
. . red
yes .
Dot? .
Color?
no
green
. yellow
.
.
.
solid
.
Outline?
dashed
.
Decision Tree
. .
. .
. .
Color
red green
yellow
GINInode(Node) = 1- [ p(c)] 2
c classes
Sv
GINIsplit (A) = S
GINI(N v )
v Value s(A)
Splitting Based on Continuous Attributes
Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No
true if Ac c
Ac =
false otherwise
How to choose c ?
• consider all possible splits and finds the best cut
Strengths of decision tree methods
• Ability to generate understandable rules
• Missing Values
• Costs of Classification