0% found this document useful (0 votes)
48 views

09 - ML - Decision Tree

This document introduces decision trees and provides an example of how a decision tree model is constructed from training data to classify whether individuals will buy a computer based on attributes like age, income, and student status. It then shows how the trained decision tree model can be applied to new test data to make predictions. The general process of decision tree induction builds the tree in a top-down, recursive manner by splitting the training data on attributes that best separate the classes at each node.

Uploaded by

In Tech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

09 - ML - Decision Tree

This document introduces decision trees and provides an example of how a decision tree model is constructed from training data to classify whether individuals will buy a computer based on attributes like age, income, and student status. It then shows how the trained decision tree model can be applied to new test data to make predictions. The general process of decision tree induction builds the tree in a top-down, recursive manner by splitting the training data on attributes that best separate the classes at each node.

Uploaded by

In Tech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Decision Tree

Introduction to Machine Learning

Dr. Hikmat Ullah Khan


Assistant Professor
COMSATS Institute of Information Technology,
Wah Cantt, Pakistan
Email: Hikmat.ullah@ciitwah.edu.pk

1
Decision Tree Induction: An Example
age income student credit_rating buys_computer
<=30 high no fair no
 Training data set: Buys_computer <=30 high no excellent no
 Resulting tree: 31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
age? <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
<=30 overcast
31..40 >40 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

student? yes credit rating?

no yes excellent fair

no yes yes
2
Example of a Decision Tree

Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No Refund
No
Yes No
3 No Single 70K
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree


Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a

 top-down

 recursive

 divide-and-conquer manner

 At start, all the training examples are at the root

 Attributes are categorical


 (continuous are discretized in advance)
 Test attributes are selected on the basis on statistical measure
(e.g., information gain)

12
Decision Tree Induction
 Many Algorithms:
 Hunt’s Algorithm (one of the earliest)

 CART

 ID3, C4.5, C5.0

 SLIQ,SPRINT
General Structure
Tid Refund Marital Taxable
 Let Dt be the set of training records Status Income Cheat
that reach a node t 1 Yes Single 125K No
 General Procedure: 2 No Married 100K No

 If Dt contains records that belong 3 No Single 70K No

the same class yt, then t is a leaf 4 Yes Married 120K No


5 No Divorced 95K Yes
node labeled as yt
6 No Married 60K No
 If Dt contains records that belong
7 Yes Divorced 220K No
to more than one class, use an 8 No Single 85K Yes
attribute test to split the data into 9 No Married 75K No
smaller subsets. Recursively apply 10 No Single 90K Yes
the procedure to each subset.
10

Dt

?
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No
Refund
Don’t 3 No Single 70K No
Yes No
Cheat 4 Yes Married 120K No
Don’t Don’t 5 No Divorced 95K Yes
Cheat Cheat
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes

Refund Refund 9 No Married 75K No


Yes No Yes No 10 No Single 90K Yes
10

Don’t Don’t Marital


Marital Cheat
Cheat Status Status
Single, Single,
Married Married
Divorced Divorced

Don’t Taxable Don’t


Cheat Cheat
Cheat Income
< 80K >= 80K

Don’t Cheat
Cheat
Tree Induction
 Greedy strategy.
 Split the records based on an attribute test that

optimizes certain criterion.

 Issues
 Determine how to split the records

 How to specify the attribute test condition?


 How to determine the best split?
 Determine when to stop splitting
Tree Induction
 Greedy strategy.
 Split the records based on an attribute test that

optimizes certain criterion.

 Issues
 Determine how to split the records

 How to specify the attribute test condition?


 How to determine the best split?
 Determine when to stop splitting
How to Specify Test Condition?
 Depends on attribute types
 Nominal

 Ordinal

 Continuous

 Depends on number of ways to split


 2-way split

 Multi-way split
Splitting Based on Nominal Attributes

 Multi-way split: Use as many partitions as distinct


values.
CarType
Family Luxury
Sports

 Binary split: Divides values into two subsets.


Need to find optimal partitioning.
CarType CarType
{Sports, OR {Family,
Luxury} {Family} Luxury} {Sports}
Splitting Based on Ordinal Attributes
 Multi-way split: Use as many partitions as distinct
values.
Size
Small Large
Medium

 Binary split: Divides values into two subsets.


Need to find optimal partitioning.
Size Size
{Small,
{Large}
OR {Medium,
{Small}
Medium} Large}

Size
{Small,
 What about this split? Large} {Medium}
Splitting Based on Continuous Attributes

 Different ways of handling


 Discretization to form an ordinal categorical

attribute (many)
 As defined in age attribute
 Binary Decision: (A < v) or (A  v)
 consider all possible splits and finds the best cut
 can be more compute intensive
Splitting Based on Continuous Attributes

Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split


Tree Induction
 Greedy strategy.
 Split the records based on an attribute test that

optimizes certain criterion.

 Issues
 Determine how to split the records

 How to specify the attribute test condition?


 How to determine the best split?
 Determine when to stop splitting
Brief Review of Entropy
 Entropy is the measure of uncertainty associated
with a random measure
 High entropy -> high uncertainty

 Low entropy -> low uncertainty

 It is also known as measure of dispersion


m
Entropy( D)   pi log2 ( pi )
i 1

m=2

24
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a tuple in D:
m
Info( D)   pi log2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
classify D: v | D |
Info A ( D)   j
 Info( D j )
j 1 | D |
 Information gained by branching on attribute A

Gain(A)  Info(D)  InfoA(D)


25
Examples for computing Entropy
Entropy (t )   p( j | t ) log p( j | t )
j 2

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


C2 6 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0

C1 1
C2 5

C1 2
C2 4
Examples for computing Entropy
Entropy (t )   p( j | t ) log p( j | t )
j 2

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


C2 6 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6


C2 5 Entropy = – (1/6) log2 (1/6) – (5/6) log2 (5/6) = 0.65

C1 2 P(C1) = 2/6 P(C2) = 4/6


C2 4 Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92
Attribute Selection: Information Gain
age income student credit_rating buys_computer
 Class P: buys_computer = “yes”
<=30 high no fair no
 Class N: buys_computer = “no”
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
9 9 5 5 >40 low yes excellent no
Info( D)  I (9,5)   log2 ( )  log2 ( ) 0.940
14 14 14 14 31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
28
Attribute Selection: Information Gain
age income student credit_rating buys_computer
 Class P: buys_computer = “yes”
<=30 high no fair no
 Class N: buys_computer = “no”
9 9 5 5 <=30 high no excellent no
Info( D)  I (9,5)   log2 ( )  log2 ( ) 0.940
14 14 14 14 31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
age pi ni I(pi, ni) >40 low yes excellent no
<=30 2 3 0.971 31…40 low yes excellent yes
31…40 4 0 0 <=30 medium no fair no
>40 3 2 0.971 <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
29
Attribute Selection: Information Gain
 Class P: buys_computer = “yes” 5 4
Infoage ( D)  I (2,3)  I (4,0)
 Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D)  I (9,5)   log2 ( )  log2 ( ) 0.940  I (3,2)  0.694
14 14 14 14 14
age pi ni I(pi, ni) 5
<=30 2 3 0.971 I (2,3)means “age <=30” has 5 out of
14
14 samples, with 2 yes’es and 3
31…40 4 0 0
>40 3 2 0.971 no’s. Hence
age
<=30
income student credit_rating
high no fair
buys_computer
no
Gain(age)  Info( D)  Infoage ( D)  0.246
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no 30
Attribute Selection: Information Gain
 Class P: buys_computer = “yes” 5 4
Infoage ( D)  I (2,3)  I (4,0)
 Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D)  I (9,5)   log2 ( )  log2 ( ) 0.940  I (3,2)  0.694
14 14 14 14 14
age pi ni I(pi, ni) 5
<=30 2 3 0.971 I (2,3)means “age <=30” has 5 out of
14
14 samples, with 2 yes’es and 3
31…40 4 0 0
>40 3 2 0.971 no’s. Hence
age
<=30
income student credit_rating
high no fair
buys_computer
no
Gain(age)  Info( D)  Infoage ( D)  0.246
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes Similarly,
>40 low yes fair yes

Gain(income)  0.029
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30
>40
low
medium
yes fair
yes fair
yes
yes
Gain( student )  0.151
<=30
31…40
medium
medium
yes excellent
no excellent
yes
yes Gain(credit _ rating )  0.048
31…40 high yes fair yes
>40 medium no excellent no 31
Gini Index (CART, IBM IntelligentMiner)
 If a data set D contains examples from n classes, gini index,
gini(D) is defined as n 2
gini( D)  1  p j
j 1
where pj is the relative frequency of class j in D

|D1| |D |
gini A ( D)  gini(D1)  2 gini(D2)
|D| |D|

gini( A)  gini(D)  giniA (D)

32
Computation of Gini Index
 Ex. D has 9 tuples in buys_computer = “yes”
2
and
2
5 in “no”
9 5
gini( D)  1        0.459
 14   14 

33
Examples for computing GINI
GINI (t )  1  [ p( j | t )]2
j

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


C2 6 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

C1 1
C2 5

C1 2
C2 4
Examples for computing GINI
GINI (t )  1  [ p( j | t )]2
j

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


C2 6 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6


C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 P(C1) = 2/6 P(C2) = 4/6


C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Third measure of Classification Error

36
Three in One

37
Tree Induction
 Greedy strategy.
 Split the records based on an attribute test that

optimizes certain criterion.

 Issues
 Determine how to split the records

 How to specify the attribute test condition?


 How to determine the best split?
 Determine when to stop splitting
Stopping Criteria for Tree Induction
 Stop expanding a node when all the records belong to
the same class
Decision Tree Based Classification
 Advantages:
 Extremely fast at classifying unknown records

 Easy to interpret for small-sized trees

 Accuracy is comparable to other classification

techniques for many simple data sets


 Disadvantages
 Not scalable (add one attribute, all tree needed to

be computed again)
 Not good accuracy for large dataset

 Not robust (less handling of large attributes)


WEKA complete Book
 WEKA provides Wiki for all the concepts of Machine
Learning and data mining

 https://www.cs.waikato.ac.nz/ml/weka/book.html

 WEKA examples for Decision Tree has been uploaded


as reading material

41
Examples
 Search “Data Mining Lecture -- Decision Tree | Solved Example (Eng-
Hindi)”
 URL:
 https://www.youtube.com/watch?v=cKl7WV_EKDU

You All should solve the complete example of


Weather data
30 min video
2 hour solution

42
More Examples
 Additional material for learning
 http://www.otnira.com/2013/03/17/decision-tree-
induction/

 A simple research paper based on weather data and


decision tree
 https://www.ijarcce.com/upload/2016/may-
16/IJARCCE%20114.pdf

43
DT Classification Task (optional)
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Every one can has DT in his mind for
every task

45

You might also like