0% found this document useful (0 votes)

11 views34 pages

L04 Decision Trees

Uploaded by

black hello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views34 pages

L04 Decision Trees

Uploaded by

black hello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Decision Tree

1
Decision Tree - Introduction

+non-parametric supervised learning approach

(data does not have a normal distribution)
+can be applied to both regression and
classification problems
+Using tree analogy with a sequential decision
process
+Start from the root node, a feature
Decision Tree - Introduction
+Start from the root node
+a feature is evaluated, and one of the 2 nodes
(branches) is selected
+Each node in the tree is basically a decision
rule
+The procedure is repeated until a final leaf is
reached - target
Types of Decision Tree Algorithms
+ID3 (Iterative Dichotomiser 3)
+C4.5
+C5.0
+CART (Classification and Regression trees)
Types of Decision Tree Algorithms
ID3 C4.5 C5.0 CART
• developed in 1986 by • developed in 1993 by Ross • Quinlan’s • very similar to C4.5,
Ross Quinlan Quinlan, latest version but it differs in that it
• creates a multiway • successor to ID3 release under supports numerical
tree, finding for each • removed the restriction that a proprietary target variables
node (i.e., in a greedy features must be categorical license (regression)
manner) the by dynamically defining a • uses less • does not compute
categorical feature discrete attribute (based on memory and rule sets
that will yield the numerical variables) that builds smaller • constructs binary
largest information partitions the continuous rulesets than trees using the
gain for categorical attribute value into a discrete C4.5 while feature and threshold
targets set of intervals being more that yield the largest
• trees are grown to • converts the trained trees accurate information gain at
their maximum size (i.e., the output of the ID3 each node
and then a pruning algorithm) into sets of if-then
step is usually applied rules
to improve the ability • accuracy of each rule is then
of the tree to evaluated to determine the
generalize to unseen order in which they should be
data applied. Pruning is done by
removing a rule’s
precondition if the accuracy
Terminology

Source: https://medium.datadriveninvestor.com/the-basics-of-decision-trees-e5837cc2aba7
Terminology
+Root node : is the first node in decision trees
+Splitting : is a process of dividing node into two or more sub-nodes,
starting from the root node
+Node : splitting results from the root node into sub-nodes and splitting
sub-nodes into further sub-nodes
+Leaf or terminal node : end of a node, since node cannot be split
anymore
+Pruning : is a technique to reduce the size of the decision tree by
removing sub-nodes of the decision tree. The aim is reducing
complexity for improved predictive accuracy and to avoid overfitting
+Branch / Sub-Tree : A subsection of the entire tree is called branch or
sub-tree.
+
Intuition

+Case: Predicting salary of baseball player:

Featur
e Attribute

Year<4.5
Root node Branch

3 regions can be written as

• R1 ={X | Years<4.5 }
Hits<117.
• R2 ={X | Years>=4.5 ,Hits<117.5 }
5.11 • R3 ={X | Years>=4.5 , Hits>=117.5
5

• Note:
Terminal node
• Hitters data
or leaf
• Salary in mean log
6.74 6.00 • Hits is number of hits the
player made in the previous
Splitting in Decision Trees
+split the nodes at the most informative features
using the decision algorithm
+start at the tree root and split the data on the
feature that results in the largest information
gain (IG)
+the objective function is to maximize the
information gain (IG) at each split
Splitting in Decision Trees
• Select a
feature and
split data
into binary
tree
• Continue
splitting
with Until:
available • Leaf node(s) are pure (only
features one class remains)
• A maximum depth is reached
• A performance metric is
achieved
Splitting in Decision Trees
+( (1)
+- feature to perform the split
+- data set of the parent
+- -th child node
+ – impurity measure
+- total number of samples at the parent node
+- number of samples in the -th child node
Splitting in Decision Trees
+information gain is simply the difference between the impurity of
the parent node and the sum of the child node impurities
+the lower the impurity of the child nodes, the larger the
information gain
+for simplicity and to reduce the combinatorial search space, most
libraries (including scikit-learn) implement binary decision trees -
each parent node is split into two child nodes, D-left and D-right
+( (2)
+3 common impurity measures or splitting criteria used in binary
decision trees
+Gini impurity (IG), entropy (IH), and misclassification error (IE)
Sample Dataset
+Simple simulation with Heart Disease Data set with 303 rows and
has 13 attributes. Target consist 138 value 0 and 165 value 1

use the sex, fbs (fasting blood sugar), exang (exercise

induced angina), and target attributes
Gini Impurity
+used by the CART (classification and regression tree)
algorithm for classification trees
+measure of how often a randomly chosen element from the
set would be incorrectly labeled if it was randomly labeled
according to the distribution of labels in the subset

here j is the number of classes present in the node and p is th

stribution of the class in the node.
Gini Impurity
+determine which separation is best by measuring and
comparing Gini Impurity in each attribute
+The lowest Gini Impurity value on the first iteration will
be the root node
Gini Impurity – Sex attribute
Gini Impurity – Sex attribute

calculate the total Gini Impurity with weight average. Left

Node represented 138 patient while Right Node represented
165 patient
Gini Impurity – Fbs (fasting blood sugar) attribute

calculate the total Gini Impurity with

weight average. Left Node
represented 138 patient while Right
Node represented 165 patient
Gini Impurity – Exang (exercise induced
angina) attribute

calculate the total Gini Impurity with

weight average. Left Node represented
138 patient while Right Node represented
165 patient blood sugar) has the lowest
Fbs (fasting
Gini Impurity, so we will use it as the Root
Entropy
+used by the ID3, C4.5 and C5.0 tree-generation
algorithms
+information gain is based on the concept of entropy,
the entropy measure

where j is the number of classes present in

the node and p is the distribution of the class
in the node.
Entropy
+entropy in Target attribute first
Entropy – Sex attribute
Entropy – Fbs attribute
Entropy – Exang attribute

Fbs (fasting blood sugar) has the

highest entropy, so we will use it as
the Root Node. In this case here, it
shows the same results we obtained
from Gini Impurity.
Misclassification Impurity
+also called classification error
+this index is not the best choice because it’s not
particularly sensitive to different probability
distributions
Classification Error vs Entropy
Classification error Entropy
+ flat function with maximum at center
+ has the same maximum
+ Center represents ambiguity—50/50 split
+ Splitting metrics favor results that are
but is curved
furthest away from the center
+ Curvature allows splitting
to continue until nodes
are pure
Classification Error vs Entropy
+Classification error
+With classification error, the function is
flat
+Final average classification error can be
identical to parent
+Resulting in premature stopping
Classification Error vs Entropy
+Entropy gain
+function has a "bulge"
+Allows average information of children
to be less than parent
+Results in information gain and
continued splitting
Gini Index
+In practice, Gini index often used for splitting
+Function is similar to entropy—has bulge
+Does not contain logarithm
How to split on continuous data?
1. Order data by ascending
2. Calculate the average
weight
3. Calculate the information
gain
How to split on continuous data?
Decision Trees are High Variance

• Problem: decision trees

tend to overfit
• Small changes in data
greatly affect prediction
- high variance
• Solution: Prune trees

• Variance is the variability of model prediction for a given data point or a value which tells us
spread of our data
• Model with high variance pays a lot of attention to training data and does not generalize on the
data which it hasn’t seen before
• such models perform very well on training data but has high error rates on test data
Pruning Decision Trees

• How to decide what

leaves to prune?
• Solution: prune based on
classification error
threshold

𝐸 ( 𝑡 )=1 −max [ 𝑝 ( 𝑖|𝑡 ) ]

𝑖
Strengths of Decision Trees

• Easy to interpret and

implement—"if…then…
else" logic
• Handle any data
category—binary,
ordinal, continuous
• No preprocessing or
scaling required

Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
A Quick Approach To Statistics by G.R.pashA
77% (13)
A Quick Approach To Statistics by G.R.pashA
210 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Click The Link Below To Download - : Regression-And-Anova-10671280
No ratings yet
Click The Link Below To Download - : Regression-And-Anova-10671280
81 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Henderson 1984 PDF
No ratings yet
Henderson 1984 PDF
384 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Chapter 2 Network Protocols - Communication - July 2023
No ratings yet
Chapter 2 Network Protocols - Communication - July 2023
56 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Chapter 6 Network Layer - July 2023
No ratings yet
Chapter 6 Network Layer - July 2023
58 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
Classification
No ratings yet
Classification
45 pages
Chapter 10 Application Layer - July 2023
No ratings yet
Chapter 10 Application Layer - July 2023
36 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
ANOVA (Analysis of Variance) : NOVA Example
No ratings yet
ANOVA (Analysis of Variance) : NOVA Example
15 pages
08 Factorial2 PDF
No ratings yet
08 Factorial2 PDF
40 pages
Chapter 4 Data Link Layer (OSI Model) - July 2023
No ratings yet
Chapter 4 Data Link Layer (OSI Model) - July 2023
39 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
Z Table
No ratings yet
Z Table
4 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Chapter 6 - Multimedia Element Video
No ratings yet
Chapter 6 - Multimedia Element Video
44 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
Fit Indices Commonly Reported For CFA and SEM
No ratings yet
Fit Indices Commonly Reported For CFA and SEM
2 pages
Class Basic
No ratings yet
Class Basic
75 pages
Chap01 - Intro To Programming
No ratings yet
Chap01 - Intro To Programming
37 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Statistics and Probability Test Items
No ratings yet
Statistics and Probability Test Items
10 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Guide To Install Visual Studio 2019
No ratings yet
Guide To Install Visual Studio 2019
3 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Logistic Regression Cia3
No ratings yet
Logistic Regression Cia3
14 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
L02 Classification and Regression
No ratings yet
L02 Classification and Regression
26 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Chapter9 Regression Multicollinearity
No ratings yet
Chapter9 Regression Multicollinearity
25 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Trees
No ratings yet
Decision Trees
17 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Lesson 13
No ratings yet
Lesson 13
21 pages
Research)
No ratings yet
Research)
48 pages
Practical 1 Slide
No ratings yet
Practical 1 Slide
20 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Business Analytics Project
100% (1)
Business Analytics Project
11 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Results of Statistical Analysis of Pressure Relief Valve Proof Test Data
No ratings yet
Results of Statistical Analysis of Pressure Relief Valve Proof Test Data
20 pages
L01 Introduction To ML
No ratings yet
L01 Introduction To ML
16 pages
Mean Median Mode
100% (3)
Mean Median Mode
12 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
BSE181055-Assignment 3
No ratings yet
BSE181055-Assignment 3
16 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
L05 Unsupervised Learning - Overview
No ratings yet
L05 Unsupervised Learning - Overview
16 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Chapter 3
No ratings yet
Chapter 3
12 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Course Syllabus - Basic Statistics
No ratings yet
Course Syllabus - Basic Statistics
3 pages
Excercise Review (Chapter 4-1)
No ratings yet
Excercise Review (Chapter 4-1)
4 pages
Python Decision Tree Classification
No ratings yet
Python Decision Tree Classification
14 pages
Setup - Firebase
No ratings yet
Setup - Firebase
9 pages
Anovaa
No ratings yet
Anovaa
10 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
Linearity, Linear Range, Sensi - Vity: Methods Ability To Obtain Signals
No ratings yet
Linearity, Linear Range, Sensi - Vity: Methods Ability To Obtain Signals
14 pages
Further Mathematics: Pearson Edexcel Level 3 GCE
No ratings yet
Further Mathematics: Pearson Edexcel Level 3 GCE
16 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Tree For Classification (ID3 Information Gain Entropy)
No ratings yet
Decision Tree For Classification (ID3 Information Gain Entropy)
3 pages
Flowchart For All C.I. Cases (Compact)
No ratings yet
Flowchart For All C.I. Cases (Compact)
4 pages
Practical 2 Hadoop Distributed File System (HDFS)
No ratings yet
Practical 2 Hadoop Distributed File System (HDFS)
4 pages
Forecasting Reviewer
No ratings yet
Forecasting Reviewer
3 pages
Tariro Mupfurutsa R159097Q Hence Commnication Skills Assignment 1 MR Gumindoga
No ratings yet
Tariro Mupfurutsa R159097Q Hence Commnication Skills Assignment 1 MR Gumindoga
4 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
The Data Science Interview Blueprint by Leon Chlon
No ratings yet
The Data Science Interview Blueprint by Leon Chlon
10 pages
Ankur IndiaBulls
No ratings yet
Ankur IndiaBulls
7 pages
10 Quiz 1 Magbanua
No ratings yet
10 Quiz 1 Magbanua
3 pages
MT Chapter 17
No ratings yet
MT Chapter 17
2 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

L04 Decision Trees

Uploaded by

L04 Decision Trees

Uploaded by

Decision Tree

+non-parametric supervised learning approach

+Case: Predicting salary of baseball player:

3 regions can be written as

use the sex, fbs (fasting blood sugar), exang (exercise

here j is the number of classes present in the node and p is th

calculate the total Gini Impurity with weight average. Left

calculate the total Gini Impurity with

calculate the total Gini Impurity with

where j is the number of classes present in

Fbs (fasting blood sugar) has the

• Problem: decision trees

• How to decide what

𝐸 ( 𝑡 )=1 −max [ 𝑝 ( 𝑖|𝑡 ) ]

• Easy to interpret and

You might also like