Classification_Decision Tree

Uploaded by

Sudhanshu Dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Classification_Decision Tree

Uploaded by

Sudhanshu Dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

DECISION TREE

Arvind Deshpande
11/6/2024 Arvind Deshpande (VJTI) 2

Classification
• Supervised learning – Training data accompanied by
labels indicating the class of observations.
• Classifies data based on the training data set and uses it
to classify new data.
• e. g. Classification model to categorize bank loan
applications as either safe or risky.
11/6/2024 Arvind Deshpande (VJTI) 3

Model Construction
11/6/2024 Arvind Deshpande (VJTI) 4

Model Usage
11/6/2024 Arvind Deshpande (VJTI) 5

Problem
11/6/2024 Arvind Deshpande (VJTI) 6

Decision tree
11/6/2024 Arvind Deshpande (VJTI) 7

Decision tree
• A flowchart-like tree structure, where
• each internal node (nonleaf node) denotes a test on an
attribute,
• each branch represents an outcome of the test, and
• each leaf node (or terminal node) holds a class label.
• The topmost node in a tree is the root node.
• A greedy algorithm that is a non-backtracking
approach in which decision trees are constructed
in a top down recursive divide and conquer
manner, it is very interpretable model.
11/6/2024 Arvind Deshpande (VJTI) 8

Decision tree
• Algorithm: Generate decision tree. Generate a decision
tree from the training tuples of data partition, D.
• Input:
• Data partition, D, which is a set of training tuples and their
associated class labels;
• attribute list, the set of candidate attributes;
• Attribute selection method, a procedure to determine the splitting
criterion that “best” partitions the data tuples into individual classes.
This criterion consists of a splitting attribute and, possibly, either a
split-point or splitting subset.
• Output: A decision tree.
11/6/2024 Arvind Deshpande (VJTI) 9

Method
(1) create a node N;
(2) if tuples in D are all of the same class, C, then
(3) return N as a leaf node labeled with the class C;
(4) if attribute list is empty then
(5) return N as a leaf node labeled with the majority class
in D; // majority voting
(6) apply Attribute selection method(D, attribute list) to
find the “best” splitting_criterion;
(7) label node N with splitting_criterion;
(8) if splitting attribute is discrete-valued and
multiway splits allowed then // not restricted to
binary trees
(9) attribute list ↔ attribute list - splitting attribute; // remove
splitting attribute
11/6/2024 Arvind Deshpande (VJTI) 10
11/6/2024 Arvind Deshpande (VJTI) 11

Method
(10) for each outcome j of splitting criterion
// partition the tuples and grow subtrees for each
partition
(11) let Dj be the set of data tuples in D satisfying
outcome j; // a partition
(12) if Dj is empty then
(13) attach a leaf labeled with the majority class in
D to node N;
(14) else attach the node returned by Generate
decision tree(Dj , attribute list) to node N;
end for
(15) return N;
11/6/2024 Arvind Deshpande (VJTI) 12

Method
• The algorithm uses the same process recursively to form a decision
tree for the tuples at each resulting partition, Dj , of D (step 14).
• The recursive partitioning stops only when any one of the following
terminating conditions is true:
1. All the tuples in partition D (represented at node N) belong to the
same class (steps 2 and 3).
2. There are no remaining attributes on which the tuples may be
further partitioned (step 4). In this case, majority voting is employed
(step 5). This involves converting node N into a leaf and labeling it
with the most common class in D. Alternatively, the class
distribution of the node tuples may be stored.
3. There are no tuples for a given branch, that is, a partition Dj is
empty (step 12). In this case, a leaf is created with the majority
class in D (step 13).
• The resulting decision tree is returned (step 15).
11/6/2024 Arvind Deshpande (VJTI) 13

Attribute Selection Measures

• An attribute selection measure is a heuristic for selecting
the splitting criterion that “best” separates a given data
partition, D, of class-labeled training tuples into individual
classes.
• Let D, the data partition, be a training set of class-labeled
tuples. Suppose the class label attribute has m distinct
values defining m distinct classes, Ci (for i = 1, …, m).
• Let Ci, D be the set of tuples of class Ci in D. Let Dj and
Ci, Dj denote the number of tuples in D and Ci, D
respectively.
11/6/2024 Arvind Deshpande (VJTI) 14

Information Gain
• Let node N represent or hold the tuples of partition D.
• The attribute with the highest information gain is chosen
as the splitting attribute for node N.
• This attribute minimizes the information needed to classify
the tuples in the resulting partitions and reflects the least
randomness or “impurity” in these partitions.
• Such an approach minimizes the expected number of
tests needed to classify a given tuple and guarantees that
a simple (but not necessarily the simplest) tree is found.
11/6/2024 Arvind Deshpande (VJTI) 15

Information Gain
• The expected information needed to classify a tuple in D
is given by
𝑚

𝐼𝑛𝑓𝑜 𝐷 = − ෍ 𝑝𝑖 log 2 𝑝𝑖
𝑖=1
• where pi is the nonzero probability that an arbitrary tuple
in D belongs to class Ci and is estimated by |Ci,Dj|/|D|.
• A log function to the base 2 is used, because the
information is encoded in bits.
• Info(D) also known as the entropy of D is just the average
amount of information needed to identify the class label of
a tuple in D.
11/6/2024 Arvind Deshpande (VJTI) 16
11/6/2024 Arvind Deshpande (VJTI) 17

Information Gain
• it is quite likely that the partitions will be impure (e.g.,
where a partition may contain a collection of tuples from
different classes rather than from a single class).
• The amount of information needed (after the partitioning)
to arrive at an exact classification is measured by
𝑣
𝐷𝑗
𝐼𝑛𝑓𝑜𝐴 𝐷 = ෍ × 𝐼𝑛𝑓𝑜 𝐷𝑗
𝐷
𝑖=1
𝐷𝑗
• The term acts as the weight of the jth partition.
𝐷
• 𝐼𝑛𝑓𝑜 𝐷𝑗 is the expected information required to classify a
tuple from D based on the partitioning by A.
11/6/2024 Arvind Deshpande (VJTI) 18
11/6/2024 Arvind Deshpande (VJTI) 19

Information Gain
• The smaller the expected information (still) required, the
greater the purity of the partitions.
• Information gain is defined as the difference between the
original information requirement (i.e., based on just the
proportion of classes) and the new requirement (i.e.,
obtained after partitioning on A). That is,
𝐺𝑎𝑖𝑛 𝐴 = 𝐼𝑛𝑓𝑜 𝐷 − 𝐼𝑛𝑓𝑜𝐴 𝐷
• The attribute A with the highest information gain, 𝐺𝑎𝑖𝑛 𝐴 ,
is chosen as the splitting attribute at node N.
11/6/2024 Arvind Deshpande (VJTI) 20

Information Gain
• For information gain of an attribute that is continuous valued,
determine the “best” split-point for A, where the split-point is a
threshold on A.
• Sort the values of A in increasing order.
• Typically, the midpoint between each pair of adjacent values is
considered as a possible split-point. Therefore, given v values
of A, then v - 1 possible splits are evaluated.
• For each possible split-point for A, evaluate 𝐼𝑛𝑓𝑜𝐴 𝐷 , where
the number of partitions is two, that is, v = 2 (or Dj = 1, 2)
• The point with the minimum expected information requirement
for A is selected as the split point for A.
• D1 is the set of tuples in D satisfying A < split point, and D2 is
the set of tuples in D satisfying A > split point.
11/6/2024 Arvind Deshpande (VJTI) 21

Gain Ratio
• The information gain measure is biased toward tests with
many outcomes. That is, it prefers to select attributes
having a large number of values.
• Gain ratio attempts to overcome this bias. It applies a kind
of normalization to information gain using a “split
information” value defined analogously with 𝐼𝑛𝑓𝑜 𝐷 as
𝑣
𝐷𝑗 𝐷𝑗
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝐴 𝐷 = − ෍ × log 2
𝐷 𝐷
𝑖=1
• This value represents the potential information generated
by splitting the training data set, D, into v partitions,
corresponding to the v outcomes of a test on attribute A.
Note that,
11/6/2024 Arvind Deshpande (VJTI) 22

Gain Ratio
𝐺𝑎𝑖𝑛 𝐴
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 𝐴 =
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝐴 𝐷
• The attribute with the maximum gain ratio is selected as
the splitting attribute
11/6/2024 Arvind Deshpande (VJTI) 23

Gini index
• The Gini index measures the impurity of D, a data
partition or set of training tuples, as
𝑚

𝐺𝑖𝑛𝑖 𝐷 = 1 − ෍ 𝑝𝑖 2
1=1
where pi is the probability that a tuple in D belongs to
class Ci and is estimated by |Ci,Di|/|Di|. The sum is
computed over m classes.
• The Gini index considers a binary split for each attribute.
11/6/2024 Arvind Deshpande (VJTI) 24

Gini index
• When considering a binary split, we compute a weighted
sum of the impurity of each resulting partition. For
example, if a binary split on A partitions D into D1 and D2,
the Gini index of D given that partitioning is
𝐷1 𝐷2
𝐺𝑖𝑛𝑖𝐴 𝐷 = 𝐺𝑖𝑛𝑖 𝐷1 + 𝐺𝑖𝑛𝑖 𝐷2
𝐷 𝐷
• For each attribute, each of the possible binary splits is
considered.
• For a discrete-valued attribute, the subset that gives the
minimum Gini index for that attribute is selected as its
splitting subset.
11/6/2024 Arvind Deshpande (VJTI) 25

Gini index
• For continuous-valued attributes, each possible split-point must
be considered.
• The midpoint between each pair of (sorted) adjacent values is
taken as a possible split-point.
• The point giving the minimum Gini index for a given
(continuous-valued) attribute is taken as the split-point of that
attribute.
• D1 is the set of tuples in D satisfying A < split point, and D2 is
the set of tuples in D satisfying A > split point.
• The reduction in impurity that would be incurred by a binary
split on a discrete- or continuous-valued attribute A is
∆𝐺𝑖𝑛𝑖 𝐴 = 𝐺𝑖𝑛𝑖 𝐷 − 𝐺𝑖𝑛𝑖𝐴 𝐷
• The attribute that maximizes the reduction in impurity (or,
equivalently, has the minimum Gini index) is selected as the
splitting attribute.
11/6/2024 Arvind Deshpande (VJTI) 26

Which attribute selection measure is the

best?
• It has been shown that the time complexity of decision
tree induction generally increases exponentially with tree
height.
• Hence, measures that tend to produce shallower trees
(e.g., with multi-way rather than binary splits, and that
favor more balanced splits) may be preferred.
• However, some studies have found that shallow trees
tend to have a large number of leaves and higher error
rates.
• Despite several comparative studies, no one attribute
selection measure has been found to be significantly
superior to others.
• Most measures give quite good results.
11/6/2024 Arvind Deshpande (VJTI) 27

Tree Pruning
• When a decision tree is built, many of the branches will
reflect anomalies in the training data due to noise or
outliers.
• Tree pruning methods address this problem of overfitting
the data. Such methods typically use statistical measures
to remove the least-reliable branches.
• Pruned trees tend to be smaller and less complex and,
thus, easier to comprehend.
• They are usually faster and better at correctly classifying
independent test data (i.e., of previously unseen tuples)
than unpruned trees.
11/6/2024 Arvind Deshpande (VJTI) 28

Tree Pruning
• In the prepruning approach, a tree is “pruned” by halting
its construction early (e.g., by deciding not to further split
or partition the subset of training tuples at a given node).
• Upon halting, the node becomes a leaf. The leaf may hold
the most frequent class among the subset tuples or the
probability distribution of those tuples.
• When constructing a tree, measures such as statistical
significance, information gain, Gini index, and so on, can
be used to assess the goodness of a split.
11/6/2024 Arvind Deshpande (VJTI) 29

Tree Pruning
• The second and more common approach is postpruning,
which removes subtrees from a “fully grown” tree.
• A subtree at a given node is pruned by removing its
branches and replacing it with a leaf.
• The leaf is labeled with the most frequent class among
the subtree being replaced.
11/6/2024 Arvind Deshpande (VJTI) 30

Tree Pruning
11/6/2024 Arvind Deshpande (VJTI) 31

Advantages of Decision tree

1. Decision trees are easy to interpret and visualize.
2. It can easily capture non-linear patterns.
3. It requires fewer data pre-processing from the user, for
example, there is no need to normalize columns.
4. It can be used for feature engineering such as
predicting missing values, suitable for variable
selection.
5. The decision tree has no assumptions about distribution
because of the non-parametric nature of the algorithm.
11/6/2024 Arvind Deshpande (VJTI) 32

Disadvantages of Decision Tree

1. Sensitive to noisy data. It can overfit noisy data.
2. The small variation (or variance) in data can result in
the different decision tree. This can be reduced by
bagging and boosting algorithms.
3. Decision trees are biased with imbalance dataset, so it
is recommended to balance out the database before
creating the decision tree.

Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Attribute Selection Measures
No ratings yet
Attribute Selection Measures
15 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
dm4
No ratings yet
dm4
68 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
ML - 4
No ratings yet
ML - 4
58 pages
Decision Tree
No ratings yet
Decision Tree
47 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Construction of Decision Tree Attribute Selection Measures
No ratings yet
Construction of Decision Tree Attribute Selection Measures
5 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Attribute Selection Measure
No ratings yet
Attribute Selection Measure
3 pages
dm 3
No ratings yet
dm 3
37 pages
CH 5
No ratings yet
CH 5
81 pages
Class Basic
No ratings yet
Class Basic
75 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Data Mining - Lecture 5
No ratings yet
Data Mining - Lecture 5
33 pages
Decision Tree
100% (4)
Decision Tree
66 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
classification-by-decision-tree-induction
No ratings yet
classification-by-decision-tree-induction
25 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
ML-chap9_2024_110217
No ratings yet
ML-chap9_2024_110217
52 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
IAI&ML UNIT-4
No ratings yet
IAI&ML UNIT-4
34 pages
Machine Learning Assignment 2
No ratings yet
Machine Learning Assignment 2
6 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Imp Ques - Ans
No ratings yet
Imp Ques - Ans
15 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
MLT - Solutions (12 Weeks Merged) PDF
No ratings yet
MLT - Solutions (12 Weeks Merged) PDF
143 pages
Data Analytics - Object Segmentation UNIT-IV
No ratings yet
Data Analytics - Object Segmentation UNIT-IV
34 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
Decision Tree Questions (1)
No ratings yet
Decision Tree Questions (1)
8 pages
3 - Decision trees
No ratings yet
3 - Decision trees
16 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
ml_unit2
No ratings yet
ml_unit2
22 pages
Classification Basedon Decision Tree Algorithm
No ratings yet
Classification Basedon Decision Tree Algorithm
10 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
NguyenCongSang ITITIU20292 Lab3
No ratings yet
NguyenCongSang ITITIU20292 Lab3
21 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Decision Trees 2
No ratings yet
Decision Trees 2
18 pages
18csl76 Lab Manual Lab Material
No ratings yet
18csl76 Lab Manual Lab Material
61 pages
Dtree
No ratings yet
Dtree
101 pages
DATA MINING MODULE 3
No ratings yet
DATA MINING MODULE 3
27 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
Asc399 Feb23
No ratings yet
Asc399 Feb23
6 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
Decision Trees Palagraism
No ratings yet
Decision Trees Palagraism
16 pages
Assignment 4 DT NB LR Solution
No ratings yet
Assignment 4 DT NB LR Solution
5 pages