0% found this document useful (0 votes)

9 views

07 - ML - Decision Tree

Uploaded by

11-Nguyễn Thị Quỳnh Châu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

07 - ML - Decision Tree

Uploaded by

11-Nguyễn Thị Quỳnh Châu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Ho Chi Minh University of Banking

Department of Economic Mathematics

Machine Learning
Decision Tree

Vuong Trong Nhân (nhanvt@hub.edu.vn)

Outline
 Decision tree representation
 ID3 learning algorithm
 Which attribute is best?
 C4.5: real valued attributes
 Which hypothesis is best?
 Noise
 From Trees to Rules
 Miscellaneous

2
Decision Tree Representation
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

 Outlook, Temperature, etc.: attributes

 PlayTennis: class
 Shall I play tennis today?
3
Decision Tree for PlayTennis

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

4
Alternative Decision Tree for PlayTennis
Temperature

hot mild cool

{1,2,3,13} {4,8,10,11,12,14} {5,6,7,9}

Humidity

Normal High
{1,2,3} {13] ...

YES
Wind
Mild Strong
{1,3} {2}
Outlook NO  What is different?
Sunny Overcast  Sequence of attributes influences
{1} {3}
size and shape of tree
NO YES
5
Occam’s Principle

 Occam’s Principle:
“If two theories explain the
facts equally well, then the
simpler theory is preferred”

 Preferred the smallest tree

that correctly classifies all
training examples

6
Decision Trees
Decision tree representation:
• Each internal node tests an attribute
• Each branch corresponds to attribute value
• Each leaf node assigns a classification

How would we represent:

A
• ∧, ∨, XOR Example XOR:
yes no

B B
yes no yes no

NO YES YES NO

7
When to Consider Decision Trees

 Instances describable by attribute–value pairs

 Target function is discrete valued
 Disjunctive hypothesis may be required
 Possibly noisy training data
 Interpretable result of learning is required

 Examples:
 Medical diagnosis
 Text classification
 Credit risk analysis
8
Top-Down Induction of Decision Trees, ID3

 ID3 (Quinlan, 1986) operates on whole training set S

 Algorithm:
1. create a new node
2. If current training set is sufficiently pure:
• Label node with respective class
• We’re done
3. Else:
• x → the “best” decision attribute for current training set
• Assign x as decision attribute for node
• For each value of x, create new descendant of node
• Sort training examples to leaf nodes
• Iterate over new leaf nodes and apply algorithm recursively

9
Example ID3
• Look at current training set S

• Determine best attribute

• Split training set according to different values

10
Example ID3

• Tree

• Apply algorithm recursively

11
Example – Resulting Tree

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

12
ID3 – Intermediate Summary

• Recursive splitting of the training set

• Stop, if current training set is sufficiently pure
• ... What means pure? Can we allow for errors?
• What is the best attribute?
• How can we tell that the tree is really good?
• How shall we deal with continuous values?

13
Which attribute is best?

• Assume a training set { + , + , − , − , + , − , + , + , − , − }

(only classes)
• Assume binary attributes x 1 , x 2 , and x 3
• Produced splits:

Value 1 Value 2
x1 {+, +, −, −, + } {−, +, +, −, −}
x2 {+} {+, −, −, +, −, +, +, −, −}
x3 {+, +, +, +, −} {−, −, −, −, + }

• No attribute is perfect
• Which one to choose?

14
Entropy

• p⊕ is the proportion of positive

examples
1.0
• p⊖ is the proportion of negative
Entropy (S)

examples
0.5
• Entropy measures the impurity of S

Entropy(S) ≡ −p ⊕ log2 p⊕ − p⊖ log2 p⊖

0.0 0.5 1.0
p⊕
• Information can be seen as the
negative of entropy

15
Entropy

S = { + + + + + + + + + , − − − − − } = { 9 + , 5−}. Entropy( S) = ?

Entropy( S) = −9/14 log(9/14) − 5/14 log(5/14) = 0.94

S = { + + + + + + + + , − − − − − − } = { 8 + , 6−}. Entropy (S) = ?

Entropy( S) = −8/14 log(8/14) − 6/14 log(6/14) = 0.98

S = { + + + + + + + + + + + + + + + } = {14+}. Entropy( S) = ?

Entr opy( S) = 0
S = { + + + + + + + + − − − − − − − } = { 7 + , 7−}.
Entr opy( S) = ?

Entr opy( S) = 1
16
Entropy

• All members of 𝑆 belong to the same

1.0
class
Entropy (S)

• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = 0 (the purest set)

0.5
• Numbers of positive and negative
examples are equal (p ⊕ = p⊖ = 0.5)
• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = 1 (most impurity)
0.0 0.5 1.0
p⊕
• Numbers of positive and negative
examples are unequal
• Entropy is between 0 and 1.

17
Information Gain
• Measuring attribute x creates subsets S 1 and S 2 with
different entropies
• Taking the mean of Entropy(S 1 ) and Entropy(S 2 )
gives conditional entropy Entropy(S|x), i.e. in general:
𝑠𝑣
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆|𝑥) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝑥)

• → Choose that attribute that maximizes difference:

| sv |
Gain( S , x)  Entropy ( S )   vValues ( x ) Entropy ( Sv )
|S|

 𝑉𝑎𝑙𝑢𝑒 (x): the set of all possible values for the attribute x.
 S𝑣: the subset of S for which x has value 𝑣

 Information Gain is a measure of the effectiveness of an

attribute in classifying data.
 It is the expected reduction in entropy caused by
partitioning the objects according to this attribute.

19
Example - Training Set
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

20
Example
| sv |
Gain( S , x)  Entropy ( S )   vValues ( x ) Entropy ( Sv )
|S|

For top node: S = { 9 + , 5−}, E n t r o p y ( S ) = 0.94 ID Wind

Play
Tennis
Attribute Wind: D1 Weak No
S _ w e a k = { 6 + , 2−}, | S _we a k | = 8 D8 Weak No

S _s t r o n g = { 3 + , 3−}, | S _s t r o n g | = 6 D3 Weak Yes

E n t r o p y ( S _w e a k ) = −6/8*l o g ( 6 / 8 ) − - D4 Weak Yes

2 / 8 *l o g ( 2 / 8 ) = 0.81 D5 Weak Yes

D9 Weak Yes
E n t r o p y ( S _s t r o n g ) = 1
Expected Entropy when assuming attribute ’Wind’:
D10 Weak Yes
D13 Weak Yes
E n t r o p y ( S | W i n d ) = 8 / 1 4 * E n t r o p y ( S _w e a k ) +
D2 Strong No
6 / 1 4 *E n t r o p y ( S _s t r o n g ) = 0.89
D6 Strong No
D14 Strong No
Gain(S, Wind) = D7 Strong Yes
0.94 − 0.89 ≈ 0.05 D11 Strong Yes
D12 Strong Yes

21
Selecting the Next Attribute
• For whole training set:
 G a i n ( S , O u t l o o k ) = 0.246
 G a i n ( S , H u m i d i t y ) = 0.151
 G a i n ( S , W i n d ) = 0.048
 G a i n ( S , Te m p e r a t u r e ) = 0.029
→ O u t l o o k should be used to split training set!

• Further down in the tree, E n t r o p y ( S ) is computed locally

• Usually, the tree does not have to be minimized
• Reason of good performance of ID3!

22
Next step in growing the decision tree

23
The Resulting Decision Tree & Its Rules

24
Some issues: Real-Valued Attributes
 Temperature = 82.5
 Create discrete attributes to test continuous:
 (Temperature > 54) = true or = false
 Sort attribute values that occur in training set:

Temperature: 40 48 60 72 80 90
PlayTennis: No No Yes Yes Yes No

 Determine points where the class changes

 Candidates are (48 + 60) / 2 and (80 + 90) / 2
 Select best one using info gain
 Implemented in the system C4.5 (successor of ID3)

25
Some issues: Noise
 Consider adding noisy (=wrongly labeled) training example #15:
 S u n n y , M i l d , N o r m a l , We a k , P l a y Te n n i s = N o
, i.e. outlook = sunny, humidity = normal
 What effect on earlier tree? Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

26
Some issues: Overfitting Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

• Algorithm will introduce new test

• Unnecessary, because new example was erroneous due to the
presence of Noise
→ Overfitting corresponds to learning coincidental regularities
• Unfortunately, we generally don’t know which examples are noisy
• ... and also not the amount, e.g. percentage, of noisy examples
27
Some issues: Overfitting
 An example: continuing to grow the tree can improve the accuracy
on the training data, but perform badly on the test data.

28
[Mitchell, 1997]
Overfitting: solutions
 Some solutions:
 Stop learning early: prevent the tree before it fits the
training data perfectly.
 Prune the full tree: grow the tree to its full size, and then
post prune the tree.
 It is hard to decide when to stop learning.
 Post-pruning the tree empirically results in better
performance. But
 How to decide the good size of a tree?
 When to stop pruning?
 We can use a validation set to do pruning, such
as, reduced-error pruning, and rule-post pruning
29
Summary

 Decision tree is a Machine Learning algorithm that

can perform both classification and regression
tasks.
 Decision tree represents a function by using a tree.
 Each decision tree can be interpreted as a set of
rules of the form: IF-THEN
 Decision trees have been used in many practical
applications.

30
Advantages & Disadvantages

 Advantages
 It is simple to understand as it follows the same
process which a human follow while making any
decision in real-life.
 It can be very useful for solving decision-related
problems.
 It helps to think about all the possible outcomes for
a problem.
 There is less requirement of data cleaning
compared to other algorithms.

31
Advantages & Disadvantages

 Disadvantages
 The decision tree contains lots of layers, which
makes it complex.
 It may have an overfitting issue, which can be
resolved using the Random Forest algorithm.
 For more class labels, the computational
complexity of the decision tree may increase

32
Random forests
 Random forests (RF) is a method by Leo Breiman
(2001) for both classification and regression.
 Main idea: prediction is based on combination of many
decision trees, by taking the average of all individual
predictions
 Each tree in RF is simple but random.
 Each tree is grown differently, depending on the
choices of the attributes and training data

33
Random forests
 RF currently is one of the most popular and accurate
methods [Fernández-Delgado et al., 2014]
 It is also very general.
 RF can be implemented easily and efficiently.
 It can work with problems of very high dimensions,
without overfitting

34
How Random Forest work

35
RF: three basic ingredients

 Randomization and no pruning:

 For each tree and at each node, we select randomly a subset of
attributes.
 Find the best split, and then grow appropriate subtrees.
 Every tree will be grown to its largest size without pruning.

 Combination: each prediction later is made by taking the

average of all predictions of individual trees.

 Bagging: the training set for each tree is generated by

sampling (with replacement) from the original data.

36
Exersice

1. Build Decision tree

2. Predict :
customer (15, youth, medium, no, fair)
Customer (16, senior, low, yes, excellent) 37

MATHEMATICS Lesson Note JSS 1
100% (1)
MATHEMATICS Lesson Note JSS 1
6 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Lec-2 Decision Tree_13-8-2024
No ratings yet
Lec-2 Decision Tree_13-8-2024
38 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Machine Learning - Part 1
100% (1)
Machine Learning - Part 1
80 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
3. Tree Models
No ratings yet
3. Tree Models
42 pages
Chapter 4 (2)
No ratings yet
Chapter 4 (2)
103 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
ML-chap-3
No ratings yet
ML-chap-3
52 pages
ID3_Explanation
No ratings yet
ID3_Explanation
23 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
module 2
No ratings yet
module 2
42 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Program: Mathematics and Artificial Intelligence: Section 1. Linear Algebra and Analytic Geometry
No ratings yet
Program: Mathematics and Artificial Intelligence: Section 1. Linear Algebra and Analytic Geometry
49 pages
System Software Notes 5TH Sem Vtu
50% (2)
System Software Notes 5TH Sem Vtu
25 pages
How To Create An Online Calculator: Using Javascript To Improve Productivity & Workflow
No ratings yet
How To Create An Online Calculator: Using Javascript To Improve Productivity & Workflow
11 pages
Geometry Formulas 2D 3D Perimeter Area Volume PDF
No ratings yet
Geometry Formulas 2D 3D Perimeter Area Volume PDF
2 pages
LETV class 10 worksheet
No ratings yet
LETV class 10 worksheet
2 pages
ANTHE-2020 - (VIII Moving To IX) - (Code-O) - 13-12-2020
No ratings yet
ANTHE-2020 - (VIII Moving To IX) - (Code-O) - 13-12-2020
14 pages
C++ Programs For Class 12
78% (9)
C++ Programs For Class 12
59 pages
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Fauziah Binti Aziz
No ratings yet
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Fauziah Binti Aziz
25 pages
Tugasan Graf Fungsi
No ratings yet
Tugasan Graf Fungsi
16 pages
Chapter 5 Lee and Chen Advances in Microstrip and Printed Antennas
No ratings yet
Chapter 5 Lee and Chen Advances in Microstrip and Printed Antennas
45 pages
Fluid Dynamics - Is There A Way To Fill Tank 2 From Tank 1 Through Gravity Alone - Physics Stack Exchange
No ratings yet
Fluid Dynamics - Is There A Way To Fill Tank 2 From Tank 1 Through Gravity Alone - Physics Stack Exchange
3 pages
Worksheet I - Kinematics of Particles
No ratings yet
Worksheet I - Kinematics of Particles
7 pages
Design and Simulation of A 4.0 GHZ Low-Noise RF Amplifier With Avago MGA-665P8 MMIC
No ratings yet
Design and Simulation of A 4.0 GHZ Low-Noise RF Amplifier With Avago MGA-665P8 MMIC
3 pages
Ti 89 Titanium Exercise 7
No ratings yet
Ti 89 Titanium Exercise 7
5 pages
Python Operators: Arithmetic Operators: Arithmetic Operators Are Used To Perform Mathematical
No ratings yet
Python Operators: Arithmetic Operators: Arithmetic Operators Are Used To Perform Mathematical
10 pages
Collision of Balls
No ratings yet
Collision of Balls
10 pages
Peltier Temp Controller - Poster
No ratings yet
Peltier Temp Controller - Poster
1 page
Electric Field Summary Notes
No ratings yet
Electric Field Summary Notes
11 pages
E. P. Bos, H. A. G. Braakhuis, W. Duba, C. H. Kneepkens, C. Schabel-Medieval Supposition Theory Revisited-Brill Academic Pub (2013)
100% (1)
E. P. Bos, H. A. G. Braakhuis, W. Duba, C. H. Kneepkens, C. Schabel-Medieval Supposition Theory Revisited-Brill Academic Pub (2013)
560 pages
Hmisc
No ratings yet
Hmisc
452 pages
Assignment 5.5 5.6
No ratings yet
Assignment 5.5 5.6
4 pages
Topology Optimisation of Brake Caliper - 2020
No ratings yet
Topology Optimisation of Brake Caliper - 2020
13 pages
Expected Value Homework Solutions
No ratings yet
Expected Value Homework Solutions
6 pages
Normalised Least Mean-Square Adaptive Filtering: (Fast) Block LMS ELE 774 - Adaptive Signal Processing 1
No ratings yet
Normalised Least Mean-Square Adaptive Filtering: (Fast) Block LMS ELE 774 - Adaptive Signal Processing 1
28 pages
C Programming in Unix
80% (5)
C Programming in Unix
38 pages
Btech 1 Sem Mathematics 1 Nbc101 2020
No ratings yet
Btech 1 Sem Mathematics 1 Nbc101 2020
2 pages
Instant download Theory of Agglomerative Hierarchical Clustering Sadaaki Miyamoto pdf all chapter
100% (3)
Instant download Theory of Agglomerative Hierarchical Clustering Sadaaki Miyamoto pdf all chapter
65 pages
CMSC 451: Reductions & NP-completeness: Slides By: Carl Kingsford
No ratings yet
CMSC 451: Reductions & NP-completeness: Slides By: Carl Kingsford
22 pages
DISC Manual
No ratings yet
DISC Manual
36 pages