08 Decision - Tree
08 Decision - Tree
08 Decision - Tree
Decision tree is one of the most popular machine learning algorithms used all along.
Decision trees are used for both classification and regression problems.
For example :
A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision(rule) and each leaf represents an outcome(categorical or continues value).
Important terminology In DT
Root Node: This attribute is used for dividing the data into two or more sets. The feature attribute in this
node is selected based on Attribute Selection Techniques.
Branch or Sub-Tree: A part of the entire decision tree is called branch or sub-tree.
Splitting: Dividing a node into two or more sub-nodes based on if-else conditions.
Decision Node: After splitting the sub-nodes into further sub-nodes, then it is called as the decision node.
Leaf or Terminal Node: This is the end of the decision tree where it cannot be split into further sub-nodes.
Pruning: Removing a sub-node from the tree is called pruning.
Types of Decision Trees
Types of decision tree is based on the type of target variable we have. It can be of two types:
1. Categorical Variable Decision Tree: Decision Tree which has categorical target variable then it called as
categorical variable decision tree. Example:- In above scenario of student problem, where the target
variable was “Student will play cricket or not” i.e. YES or NO.
2. Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called as
Continuous Variable Decision Tree.
Entropy
-> It is select the best feature/attribute in DT for Splitting Purpose
-> If there are many entropy then we compute average of all entropy and compare to find which is best this
done by information gain
-> Information gain is the reduction in entropy or surprise by transforming a dataset and is often used in training
decision trees.
3. Gini Impurity
-> Both Entropy and Gini Impurity Calculate the purity of split in DT.
-> But Gini Impurity is better than Entropy. So we use Gini Impurity to find purity of split.
-> Gini Impurity is a measurement of the likelihood of an incorrect classification of a new instance of a random
variable, if that new instance were randomly classified according to the distribution of class labels from the data
set.
-> Entropy Max Value is 1 but Gini Impurity Max Value is 0.5.
-> As Gini Impurity has no logarithmic function so it is more efficient than Entropy.
Tree Pruning
Tree pruning is the method of trimming down a full tree (obtained through the above process) to reduce the
complexity and variance in the data. Just as we regularised linear regression, we can also regularise the
decision tree model by adding a new term.
Post-pruning
Post-pruning, also known as backward pruning, is the process where the decision tree is generated first and
then the non-significant branches are removed. Cross-validation set of data is used to check the effect of
pruning and tests whether expanding a node will make an improvement or not. If any improvement is there then
we continue by expanding that node else if there is reduction in accuracy then the node not be expanded and
should be converted in a leaf node.
Pre-pruning
Pre-pruning, also known as forward pruning, stops the non-significant branches from generating. It uses a
condition to decide when should it terminate splitting of some of the branches prematurely as the tree is
generated.
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM). Here it use Entropy or
Gini Impurity.
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this
process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf
node.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values
Splitting the dataset into the Training set and Test set
In [3]:
Feature Scaling
In [4]:
Out[5]:
DecisionTreeClassifier(criterion='entropy', random_state=0)
y_pred = classifier.predict(X_test)
[[62 6]
[ 3 29]]
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
The above output is completely different from the rest classification models. It has both vertical and
horizontal lines that are splitting the dataset according to the age and estimated salary variable.
As we can see, the tree is trying to capture each dataset, which is the case of overfitting.
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should
be avoided as value-mapping will have precedence in case its length matches
with 'x' & 'y'. Please use a 2-D array with a single row if you really want
to specify the same RGB or RGBA value for all points.
As we can see in the above image that there are some green data points within the red region and vice versa.
So, these are the incorrect predictions.