Decision Tree

The document discusses decision tree algorithms. It describes how decision trees are constructed in a top-down recursive manner by splitting the training examples at each node based on an attribute. It discusses stopping conditions for partitioning and describes how decision trees are used for classification. The document also discusses different measures that can be used for attribute selection such as information gain, gain ratio, and Gini index.

Uploaded by

logi9361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views30 pages

Decision Tree

Uploaded by

logi9361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Decision Tree Algorithm

• Tree is constructed in a top-down recursive divide-and conquer

manner.
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are discretized in
advance)
Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
• There are no samples left
Decision trees – Testing Phase
• Given a tuple, X, for which the associated class label is unknown, the
attribute values of the tuple are tested against the decision tree.
• A path is traced from the root to a leaf node, which holds the class prediction
for that tuple.
• Decision trees can easily be converted to classification rules.
Why are decision tree classifiers so
•popular?”
Construction of decision tree classifiers does not require any domain knowledge
or parameter setting
• Can handle multidimensional data.
• Easy to assimilate by humans.
• Learning and classification - simple and fast.
• In general, decision tree classifiers have good accuracy.
• However, successful use may depend on the data at hand.
• Application areas - Medicine, manufacturing and production, financial analysis,
astronomy, and molecular biology.
WHICH ATTRIBUTE TO SPLIT ON??
Attribute Selection Measure
• Information Gain
• Gain Ratio
• Gini Index
Entropy or Expected Information
Attribute selection measures (Splitting rules)-
Information gain, Gain ratio, Gini index
Gini index - enforce the resulting tree to be binary.

Information gain, do not, therein allowing multiway splits (i.e., two or more branches to be grown from a node).
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed tomclassify a tuple in D:
Info( D )  E ( S )   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
classify D: v | D |
InfoA ( D )  I ( attribute)  
j
 Info( D j )
j 1 | D |
 Information gained by branching on attribute A
Gain(A)  Info(D)  Info A(D)
9
Expected information (entropy)
needed to classify a tuple in D:
Information needed (after using A to split D into
v partitions) to classify D:

Information gain is defined as the difference between the

original information requirement (i.e., based on just the
proportion of classes) and the new requirement (i.e., obtained
after partitioning on A).
Further Consider Rain similar to SUNNY
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
• Tree is constructed in a top-down recursive divide-and-conquer
manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are
discretized in advance)
• Examples are partitioned recursively based on selected
attributes
• Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
• Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
• There are no samples left
22
Gain Ratio

• The information gain measure is biased toward many outcomes.

• It prefers to select attributes having a large number of values.
• C4.5, a successor of ID3, uses an extension to information gain- GAIN
RATIO - to overcome this bias.
• It applies a kind of normalization to information gain using a “split
information”
• The attribute with the maximum gain ratio is selected as the splitting
attribute.
• a kind of normalization to information gain using a “split information”

• This value represents the potential information generated by splitting the

training data set, D, into v partitions, corresponding to the v outcomes of a
test on attribute A.
• for each outcome, it considers the number of tuples having that outcome
with respect to the total number of tuples in D.
• It differs from information gain, which measures the information with
respect to classification that is acquired based on the same partitioning. The
gain ratio is defined as
Gini Index
• Gini index measures the impurity of D, a data partition or set of
training tuples

• where pi is the probability that a tuple in D belongs to class Ci

• The sum is computed over m classes.
• The Gini index considers a binary split for each attribute d as the
splitting attribute.
• The Gini index is used in CART.
• When considering a binary split,
• we compute a weighted sum of the impurity of each resulting
partition.
• For example, if a binary split on A partitions D into D1 and D2, the Gini
index of D given that partitioning is

• The reduction in impurity that would be incurred by a binary split on a discrete- or continuous-valued
attribute A is

The attribute that maximizes the reduction in impurity

(or, equivalently, has the minimum Gini index) ------
is selected as splitting attribute
Additional terms
• When decision trees are built, many of the branches may reflect noise
or outliers in the training data.
• Tree pruning attempts to identify and remove such branches, with the
goal of improving classification accuracy on unseen data.
• Scalability issue
• Incremental versions of decision tree induction have also been
proposed.
• When given new training data, these restructure the decision tree
acquired from learning on previous training data,
• rather than relearning a new tree from scratch.

Form 3 Mathematics Worksheet
33% (3)
Form 3 Mathematics Worksheet
8 pages
Sy195-205-215-225c9 Manual 2014-11-13 PDF
100% (4)
Sy195-205-215-225c9 Manual 2014-11-13 PDF
681 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
Construction of Decision Tree Attribute Selection Measures
No ratings yet
Construction of Decision Tree Attribute Selection Measures
5 pages
8.Classification
No ratings yet
8.Classification
82 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
Dm unit-4
No ratings yet
Dm unit-4
75 pages
Gini Vs Entrophy
No ratings yet
Gini Vs Entrophy
8 pages
Attribute Selection Measure
No ratings yet
Attribute Selection Measure
3 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
MLT_3_UNIT-Part-1
No ratings yet
MLT_3_UNIT-Part-1
28 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
10. Decistion Tree.pptx
No ratings yet
10. Decistion Tree.pptx
27 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Unit-3_ML
No ratings yet
Unit-3_ML
47 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
1 A PDF
No ratings yet
1 A PDF
11 pages
Classification_Decision Tree
No ratings yet
Classification_Decision Tree
32 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
UNIT 3 PART 2
No ratings yet
UNIT 3 PART 2
21 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Trees
No ratings yet
Trees
78 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
ML Unit-2
No ratings yet
ML Unit-2
16 pages
Attribute Selection Measures@
No ratings yet
Attribute Selection Measures@
2 pages
dm4
No ratings yet
dm4
68 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
dm 3
No ratings yet
dm 3
37 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Attribute Selection Measures
No ratings yet
Attribute Selection Measures
15 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Classification
No ratings yet
Classification
7 pages
4. Classification
No ratings yet
4. Classification
75 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Lec 3&4
No ratings yet
Lec 3&4
20 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
unit-4[1].docx ML
No ratings yet
unit-4[1].docx ML
42 pages
CH 5
No ratings yet
CH 5
81 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
10 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Brochure Ec750d t3 en 30 20047551 e
No ratings yet
Brochure Ec750d t3 en 30 20047551 e
24 pages
Valeo FMEA 要求
No ratings yet
Valeo FMEA 要求
22 pages
GR_5_MATH_PT2_REVISION_WORKSHEET
No ratings yet
GR_5_MATH_PT2_REVISION_WORKSHEET
4 pages
Aluminum Electrolytic Capacitors: Specifications
No ratings yet
Aluminum Electrolytic Capacitors: Specifications
2 pages
Seven Band Graphic Equalizer Data Sheet: Description Features
No ratings yet
Seven Band Graphic Equalizer Data Sheet: Description Features
4 pages
HUMBOLDT, Alexander Von. Cosmos Vol.5
No ratings yet
HUMBOLDT, Alexander Von. Cosmos Vol.5
514 pages
Chroma Meter CL-200A: Instruction Manual
No ratings yet
Chroma Meter CL-200A: Instruction Manual
60 pages
Sas#10 Fin073
No ratings yet
Sas#10 Fin073
8 pages
TM 4000 Capstone Design Project 1: Well Log Interpretations
No ratings yet
TM 4000 Capstone Design Project 1: Well Log Interpretations
16 pages
Guc 356 61 36804 2023-10-15T09 02 53
No ratings yet
Guc 356 61 36804 2023-10-15T09 02 53
5 pages
Plate Switches
No ratings yet
Plate Switches
3 pages
Getec Hydr. Welder Manual Hw225
50% (2)
Getec Hydr. Welder Manual Hw225
33 pages
06 Math6 Assess 2014
No ratings yet
06 Math6 Assess 2014
18 pages
Vlsi Ieee Paper
No ratings yet
Vlsi Ieee Paper
8 pages
Kusiak1992 - Similarity Coefficient Algorithms For Solving GT Prblem
No ratings yet
Kusiak1992 - Similarity Coefficient Algorithms For Solving GT Prblem
15 pages
Break Even Point Analysis-Definition, Explanation Formula and Calculation
100% (2)
Break Even Point Analysis-Definition, Explanation Formula and Calculation
5 pages
Airy Stress Function
No ratings yet
Airy Stress Function
59 pages
Ostendorf - HT - Folleto Ing.
No ratings yet
Ostendorf - HT - Folleto Ing.
4 pages
Syllabus For Engineering Data Analysis
No ratings yet
Syllabus For Engineering Data Analysis
5 pages
Jurnal Khatulistiwa Informatika, Vol. 2 No. 2 Desember 2014
No ratings yet
Jurnal Khatulistiwa Informatika, Vol. 2 No. 2 Desember 2014
14 pages
Lift Theory Description
No ratings yet
Lift Theory Description
6 pages
Nurhandoko2015 PDF
No ratings yet
Nurhandoko2015 PDF
5 pages
Unit-4
No ratings yet
Unit-4
40 pages
Department of Electronics: 10 W Audio Amplifier Using LM 1875
No ratings yet
Department of Electronics: 10 W Audio Amplifier Using LM 1875
50 pages
Ai Viva
No ratings yet
Ai Viva
6 pages
6013 Apr-19
No ratings yet
6013 Apr-19
3 pages
BIM Vs Traditional Quantity Surveying and Its Future Mapping
No ratings yet
BIM Vs Traditional Quantity Surveying and Its Future Mapping
5 pages
YAK-52 Operating Handbook
No ratings yet
YAK-52 Operating Handbook
12 pages