0% found this document useful (0 votes)

3 views8 pages

Tree Based Algorithms in Machine Learning

Uploaded by

maxwell.amaral

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

Tree Based Algorithms in Machine Learning

Uploaded by

maxwell.amaral

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Decision Trees in Machine Learning

Sabber Ahamed
sabbers@gmail.com
linkedin.com/in/sabber-ahamed
github.com/msahamed

July 31, 2023

Contents
1 Introduction to Trees in Machine Learning 1
1.1 Basics of a Decision Tree . . . . . . . . . . . . . . . . . . . . . 2
1.2 How does a Decision Tree work? . . . . . . . . . . . . . . . . . 3
1.2.1 Gini Impurity: . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Entropy: . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Information Gain: . . . . . . . . . . . . . . . . . . . . . 5

2 From single Tree to Forests 6

2.1 Benefits of a Forest . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Building a Forest: Bagging and Boosting . . . . . . . . . . . . 6
2.2.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Conclusion 8

1 Introduction to Trees in Machine Learning

Decision trees are one of the most intuitive and versatile algorithms in ma-
chine learning. They can be visualized, which aids in understanding the
decision-making process. They can easily handle both numerical and cate-

1
gorical data and can be used for both classification and regression problems.

1.1 Basics of a Decision Tree

The tree can be explained by two entities, namely decision nodes and leaves.
The leaves are the decisions or the final outcomes. And the decision nodes
are where the data is split. The below diagram shows a simple decision
tree. Here, the decision nodes are represented by squares, and the leaves are
represented by circles. The decision nodes are where the data is split based
on a certain attribute/feature. The leaves are the final outcomes.

Figure 1: A basic decision tree: The decision nodes are represented by

squares, and the leaves are represented by circles. The decision nodes are
where the data is split based on a certain attribute/feature. The leaves are
the final outcomes.

2
1.2 How does a Decision Tree work?
A decision tree is like a flowchart: you ask a series of questions about your
data, and based on the answers, you make a decision. The aim is to make
these decisions as accurately as possible.
Here’s a simple way to understand how a decision tree works:
1. Start with your entire dataset. This is like having a mixed bowl of
fruits and wanting to separate them.
2. Ask a question about one of the features of your data. For example,
”Is this fruit yellow?”
3. Based on the answer, split your dataset into two groups. Maybe now
you’ve separated bananas from the rest.
4. Repeat this process for each group, asking new questions and making
further splits. Continue this until you’ve separated out all the different
fruits or met some stopping criteria, like having a very small group.
The goal is to ask the right questions in the right order so that our fruits
(or any data) are separated as quickly and cleanly as possible. In technical
terms, we’re trying to find features that produce the ”purest” groups.
Some popular algorithms help decide which questions to ask and in what
order. They include:

1.2.1 Gini Impurity:

Gini impurity measures the disorder of a set of items. Higher the value of
Gini impurity, higher the disorder hence higher the importance of the feature
used to split the data.
It is calculated by subtracting the sum of the squared probabilities pi of each
class i from one. The formula is given by:
C
X
Gini(t) = 1 − p2i
i=1

Where:
• C is the number of classes.
• pi is the proportion of instances that belong to class i in the set.

3
Example: For a binary feature (e.g., number of yellow and red balls) where
the probabilities of both classes are equal, p1 = p2 = 0.5, the Gini impurity
will be:
Gini(t) = 1 − (0.52 + 0.52 ) = 0.5

Benefits:
• Computationally faster as it doesn’t involve logarithmic functions like
entropy.
• Tends to produce slightly larger trees than entropy, but the trees are
more balanced.
Application: Used widely in the CART (Classification and Regression
Trees) algorithm.

1.2.2 Entropy:
Entropy is a measure of the randomness or impurity or unpredictability of
a vriable. Higher the value of entropy, higher the disorder hence higher the
importance of the feature used to split the data.
It is calculated by summing the product of the probability of each class pi
and the logarithm of that probability as:
C
X
Entropy(t) = − pi log2 (pi )
i=1

Where:
• C is the number of classes.
• pi is the proportion of instances that belong to class i in the set.
Example: For a binary feature where the probabilities of both classes are
equal, p1 = p2 = 0.5, the entropy will be:

Entropy(t) = −[0.5 log2 (0.5) + 0.5 log2 (0.5)] = 1

For a vriable with three classes, p1 = 0.5, p2 = 0.25, p3 = 0.25, the entropy
will be:

Entropy(t) = −[0.5 log2 (0.5) + 0.25 log2 (0.25) + 0.25 log2 (0.25)] = 1.5

4
For continous variables, the entropy is calculated by finding the weighted
average of the entropy of the subsets produced by splitting on a particular
value of the variable or by converting into several discrete categories.
Benefits:
• Offers a clear measure of impurity or disorder.
• When used as a criterion, it tends to produce slightly smaller trees than
Gini impurity.
Application: Used widely in the ID3, C4.5 and C5.0 tree generation algo-
rithms.

1.2.3 Information Gain:

Information Gain is a measure of the effectiveness of an attribute in classi-
fying the training data. It’s calculated as the difference between the entropy
of the original set and the weighted average entropy of the subsets.
X |Sv |
IG(A) = Entropy(t) − Entropy(Sv )
v∈A
|S|

Where:
• A is the set of all possible values of attribute A.
• S is the total dataset.
• Sv is the subset of S for which attribute A has value v.
Example: If splitting a dataset on an attribute results in two subsets with
entropies Entropy(S1 ) = 0.8 and Entropy(S2 ) = 0.6, and these subsets
represent 30% and 70% of the data, respectively, then the information gain
from the split is:
IG(A) = Entropy(t) − [0.3 × 0.8 + 0.7 × 0.6]
Benefits:
• Directly aims to reduce the unpredictability in the dataset.
• Often results in smaller, more coherent trees.
• Favors attributes with many outcomes; however, this can be mitigated
by using the Gain Ratio, which normalizes the Information Gain.

5
2 From single Tree to Forests
Think of a decision tree as a lone decision-maker. While they might make
good decisions most of the time, they have their biases and might not always
get it right. Now, imagine if you had a team of such decision-makers, each
bringing their unique perspective and strengths. They all debate and vote
on the best decision. The chances are that this group, or ’forest’, will make
better overall decisions than a lone tree. This is the fundamental idea behind
the Random Forest and other ensemble techniques.

2.1 Benefits of a Forest

• Diversity in Opinion: Just as a team with diverse expertise can
tackle complex problems better, a forest has multiple trees that look
at the data differently, leading to better overall predictions.
• Reduces Overfitting: While a single tree might memorize (or overfit
to) the training data, averaging predictions from many trees can reduce
this overfitting.
• Improved Accuracy: Multiple trees often lead to predictions that
are more accurate than those from a single tree.

2.2 Building a Forest: Bagging and Boosting

Forests are formed using methods like bagging and boosting. These aren’t
just random collections of trees, but carefully constructed groups where each
tree brings something unique.

2.2.1 Bagging
Idea: Let each tree in the forest focus on a different part of the data. By
letting each tree train on a random subset, we ensure diversity.
Random Forest This is a popular algorithm that uses bagging. Each tree
in a Random Forest is built on a random subset of the data. Additionally,
during the splitting in the decision tree building process, a random subset of
features is considered. This double randomness ensures a variety of trees.
Extra Trees This is another popular algorithm that uses bagging. It’s
similar to Random Forests, but with two key differences:

6
Figure 2: SImplified random forest: Each tree is trained on a random subset
of the data. The final prediction is the average of the predictions from all
the trees. The image is taken from Wikipedia.

• Extra Trees doesn’t bootstrap samples (meaning it doesn’t sample with

replacement), so each tree in Extra Trees is trained on the whole train-
ing set.
• Extra Trees uses random thresholds for each feature rather than search-
ing for the best possible thresholds (like a normal decision tree does).
Isolation Forests Isolation Forests are a special type of tree-based ensemble
that uses Random Forests to detect outliers. It’s based on the idea that
outliers are few and far away from the rest of the data. So, if we isolate an
outlier, it should take fewer splits to isolate it compared to a non-outlier.
This is the basic idea behind Isolation Forests.

2.2.2 Boosting
Idea: Instead of letting each tree form its own opinion independently, make
trees learn from the mistakes of previous trees. Each new tree tries to correct
the errors made by the ensemble of existing trees. This way, the trees are

7
built sequentially, and each tree depends on the previous ones. This is called
boosting.
Boosting is a powerful technique that can lead to better predictions than
bagging. However, it’s also more prone to overfitting. Some popular boosting
algorithms are AdaBoost, Gradient Boosting, and XGBoost.

Figure 3: Schematic diagram shoing the gradient boosting: (a) sequential

boosting process (b) LightGBM architecture. (c) XGBoost architecture.

Example: LightGBM LightGBM is a gradient boosting framework. In

LightGBM, trees are grown sequentially. Each tree fits the residual errors (or
mistakes) from the ensemble of prior trees. This way, each new tree focuses on
the examples that previous trees got wrong. unlike Random Forests, Light-
GBM doesn’t use bootstrap sampling. Instead, it uses a technique called
Gradient-based One-Side Sampling (GOSS) to select which samples
to use for training. It differs from xgboost in that it grows trees leaf-wise
instead of level-wise. This difference leads to faster training and better ac-
curacy for LightGBM.

3 Conclusion
Decision tree-based algorithms, from simple trees to complex structures like
Random Forests and LightGBM, offer a wide array of tools for machine learn-
ing practitioners. Their versatility and interpretability make them invaluable
in many real-world applications.

Sah Contingency Plan in Case of Equipment Breakdown
83% (6)
Sah Contingency Plan in Case of Equipment Breakdown
3 pages
Registration Certificate Government of Andhra Pradesh
No ratings yet
Registration Certificate Government of Andhra Pradesh
4 pages
SAP GRC - UAR Requirements For S4+MDG v1.0
No ratings yet
SAP GRC - UAR Requirements For S4+MDG v1.0
5 pages
Workday Human Capital Management Datasheet
No ratings yet
Workday Human Capital Management Datasheet
4 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Decision Tree
0% (1)
Decision Tree
24 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
unit 3--ML (NEW)
No ratings yet
unit 3--ML (NEW)
68 pages
Week03 Classification
No ratings yet
Week03 Classification
22 pages
Trees
No ratings yet
Trees
78 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Act 9
No ratings yet
Act 9
22 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
unit 2
No ratings yet
unit 2
29 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Random Forests
No ratings yet
Random Forests
22 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Decision Tree For Classification (ID3 Information Gain Entropy)
No ratings yet
Decision Tree For Classification (ID3 Information Gain Entropy)
3 pages
Decision Trees and Random Forest
No ratings yet
Decision Trees and Random Forest
79 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Interactive 3D CAD Generation of Tensegrity Structures by Charalambides, Liapi
No ratings yet
Interactive 3D CAD Generation of Tensegrity Structures by Charalambides, Liapi
6 pages
Moodle Open Meetings Installation Updated
No ratings yet
Moodle Open Meetings Installation Updated
3 pages
2022 SUpplementry
No ratings yet
2022 SUpplementry
10 pages
Blind Side Josh Lanyon Download
No ratings yet
Blind Side Josh Lanyon Download
28 pages
Wabco e Basic Ecu Diagrama
100% (1)
Wabco e Basic Ecu Diagrama
1 page
DTSU666-H 100 A and 250 A Smart Power Sensor User Manual
No ratings yet
DTSU666-H 100 A and 250 A Smart Power Sensor User Manual
33 pages
Name
No ratings yet
Name
21 pages
20 Picosats 9
No ratings yet
20 Picosats 9
21 pages
Ensuring Data Quality
No ratings yet
Ensuring Data Quality
16 pages
Latest Resume
No ratings yet
Latest Resume
2 pages
Divisibility Tests and Factoring
No ratings yet
Divisibility Tests and Factoring
4 pages
Performance Analysis of Naive Bayes and J48 Classification Algorithm For Data Classification
No ratings yet
Performance Analysis of Naive Bayes and J48 Classification Algorithm For Data Classification
6 pages
03.string - Manipulations - and - Boolean - Datatype - Jupyter Notebook
No ratings yet
03.string - Manipulations - and - Boolean - Datatype - Jupyter Notebook
46 pages
CRM Assignment
No ratings yet
CRM Assignment
7 pages
An Introduction To Simulation Using SIMSCRIPT II.5
100% (2)
An Introduction To Simulation Using SIMSCRIPT II.5
164 pages
Features: P-9903 Loop Relay Module
No ratings yet
Features: P-9903 Loop Relay Module
2 pages
IGCSE Computer Science Chapter 9 - Security
No ratings yet
IGCSE Computer Science Chapter 9 - Security
8 pages
An Introductory Course Bioinformatics-I: A Student Handout
No ratings yet
An Introductory Course Bioinformatics-I: A Student Handout
320 pages
A Complete Practice Book Tally 9
100% (5)
A Complete Practice Book Tally 9
19 pages
Host in Nger
100% (1)
Host in Nger
36 pages
2ND Quarter Exam
No ratings yet
2ND Quarter Exam
3 pages
Swati Profile 1
No ratings yet
Swati Profile 1
4 pages
Industrial Ethernet Switch - SCX 2POE 2LX - 1108543: Product Description
No ratings yet
Industrial Ethernet Switch - SCX 2POE 2LX - 1108543: Product Description
15 pages
DEU-1661 VRV CAD Launch Final
No ratings yet
DEU-1661 VRV CAD Launch Final
2 pages
APA Guidelines ADMS 3015 2024
No ratings yet
APA Guidelines ADMS 3015 2024
2 pages
8279-3rd Etc Th-3 de 1st Unit Basics of Digital Electronics
No ratings yet
8279-3rd Etc Th-3 de 1st Unit Basics of Digital Electronics
34 pages