0% found this document useful (0 votes)
33 views

Data Science Concepts Lesson04 Decision Tree Concepts

R language

Uploaded by

Monish Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Data Science Concepts Lesson04 Decision Tree Concepts

R language

Uploaded by

Monish Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Science Concepts

Lesson04–Decision Tree Concepts

© Copyright 2015 All rights reserved.


Objective

• Explain Decision Trees and its applications


After completing this
lesson you will be able to: • Explain the various parameters which are used to evaluate the
outcome of the decision trees.

© Copyright 2015 All rights reserved.


Decision Trees

• Classification is a task of assigning objects to one of the several pre-defined categories.


o Descriptive modelling: Can be used as an explanatory tool to distinguish between
objects of different classes.
o Predictive modelling: Can be used to predict the class label of unknown records.

Input Output

Attribute set Classification Class label (y)


(x) model

• Objective is to build a learning algorithm with good generalization capability.

© Copyright 2015 All rights reserved.


Decision Tree–Concept Development

Classifying species as mammal or non mammal

CART C5.0 CHAID


Hunt’s algorithm Hunt’s algorithm CHAID algorithm
Split: Gini Index Split: Entropy Split: 𝑥 2 test

Criteria for comparing different methods: Predictive accuracy, speed, robustness,


scalability, Interpretability
© Copyright 2015 All rights reserved.
Decision Tree - CHAID

The hypothesis being tested is:


• H0: There is no relationship between the two variables (Y and one of the X’s which is
selected)
• Ha: There is a relationship between the two variables (dependent)

𝑶−𝑬 𝟐 𝒓𝒐𝒘𝒔𝒖𝒎 ∗ 𝒄𝒐𝒍𝒖𝒎𝒏𝑺𝒖𝒎


𝟐
𝝌 = ; E𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝑬 𝒕𝒐𝒕𝒂𝒍𝑺𝒖𝒎

• Steps in the test


o Examine each predictor variable for its statistical significance with the dependent
variable.
o Determine the most significant predictors using p-value (smallest P-Value).
o Divide the data by levels of the most significant predictors. Each of these groups will
be examined individually further.
o For each sub-group, determine the most significant variable from the remaining
predictor and divide the data again.

© Copyright 2015 All rights reserved.


Decision Tree - CHAID

The calculation is repeated for every X Contingency Table


w.r.t Y.
Gender (X) NPA Status (Y) Total
X with the smallest P value is picked
for first split. 1 0

Male (0) 135 139 274


The steps are repeated for second level
Female(1) 165 561 726
tree
Total 300 700 1000

Calculation Table
Observed Expected (𝑂 − 𝐸)2 /𝐸)
M - (Status1) 135 (274*300)/1000 = 82.2 33.91

M - (Status0) 139 (274*700)/1000=191.8 14.53

F - (Status1) 165 (726*300)/1000=217.8 12.8

F - (Status0) 561 (726*700)/1000=508.2 5.48

Total 1000 1000 66.73

P – value 3.106E-16

© Copyright 2015 All rights reserved.


Decision Tree - CART

• CART (Classification and Regression Tree) always performs binary splits.


o Gini Index is a measure of impurity at the node. If sample is completely homogenous then
less impurity. If sample is equally divided then more impurity.

𝒊 𝒕 = 𝑮𝒊𝒏𝒊 𝒕 = 𝑷 𝒋 𝒕 ∗ 𝟏−𝑷 𝒋 𝒕
𝒋=𝟏
𝑤ℎ𝑒𝑟𝑒𝑃 𝑗 𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑐𝑎𝑡𝑒𝑜𝑔𝑦 𝑗 𝑎𝑡 𝑛𝑜𝑑𝑒 𝑡.

𝐶ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 = [𝒊 𝒕 − 𝑷𝑳 ∗ 𝒊 𝒕 𝑳 − 𝑷𝑹 ∗ 𝒊 𝒕 𝑹 ]
𝑃𝐿 = 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑏𝑠 𝑖𝑛 𝑙𝑒𝑓𝑡 𝑏𝑟𝑎𝑛𝑐ℎ
𝑃𝑅 = 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑏𝑠 𝑖𝑛 𝑙𝑒𝑓𝑡 𝑏𝑟𝑎𝑛𝑐ℎ

o The variable which maximizes the change in impurity is picked up for building decision tree
o In case of a two category, minimum value of Gini is 0 and Maximum value of Gini can be
0.5 (50% zeros and 50% ones as the two categories).
o If a variable has more than two classes, the classes are combined and then Gini index is
computed:
𝑁𝑜 𝑜𝑓 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝑠 = 2𝑘−1 − 1

© Copyright 2015 All rights reserved.


Decision Tree - CART

• Entropy is another measure to select the best split

𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝒕 = − 𝑷 𝒋 𝒕 ∗ 𝒍𝒐𝒈𝟐 𝑷 𝒋 𝒕
𝒋=𝟏
𝑤ℎ𝑒𝑟𝑒𝑃 𝑗 𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑐𝑎𝑡𝑒𝑜𝑔𝑦 𝑗 𝑎𝑡 𝑛𝑜𝑑𝑒 𝑡.

o The variable which maximizes the change in impurity is picked up for building
decision tree
o In case of a two category, minimum value of entropy is 0 and Maximum value is 1
(50% zeros and 50% ones as the two categories).
o If a variable has more than two classes, the classes are combined and then Gini index is
computed:
𝑁𝑜 𝑜𝑓 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝑠 = 2𝑘−1 − 1

© Copyright 2015 All rights reserved.


Decision Tree - CART

Node t
Contingency Table D =300
ND = 700
Gender (X) NPA Status Total
1 0 t(L) t(R)
D = 135 D =165
Node (t(L): Male 135 139 274 ND = 139 ND = 561

Node (t(R): Female 165 561 726

Total 300 700 1000 .... ….

Calculation Table
Node Proportion of the Class i(t) = P(j | K)*(1-P(j | K)) Proportion Δ𝑖(𝑡)
t(): P(D | t) = 300/1000 0.30*(1-0.30) = 0.21
0.42
P(ND | t) = 700/1000 0.70*(1.0.70) = 0.21
[0.42 −
t(L): Male P(D | t(L)): 135/274 = 0.49 (0.49)*(1-0.49) = 0.25 274/1000= 0.274
0.50 (0.27∗0.50) −
P(ND | t(L)): 139/274 = 0.51 (0.51)*(1-0.51) = 0.25
(0.726∗0.34)]=
t(R): Female P(D | t(R)): 165/726 = 0.23 (0.23)*(1-0.23) = 0.17 0.34 726/1000= 0.726 0.038
P(ND | t(R)): 561/726 = 0.77 (0.77)*(1-0.77) = 0.17

© Copyright 2015 All rights reserved.


Decision Tree–Classification Matrix

Classification matrix
Predicted

Class=1 (Positive) Class=0 (Negative)


𝑇𝑃 4
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = = = 57.1%
𝑇𝑃 + 𝐹𝑁 7 Observed
Class =1 (Positive) 𝑓11 = 4 [TP] 𝑓10 = 3 [FN]

𝑇𝑁 17
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = = = 100% 𝑓01 = 0 [FP] 𝑓00 = 17 [TN]
𝑇𝑁 + 𝐹𝑃 17 Class =0 (Negative)

𝑇𝑃 + 𝑇𝑁 21
𝑀𝑜𝑑𝑒𝑙 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = = 87.5%
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 24

Sensitivity is the probability that predicted class is 1 when observed class is 1.


Specificity is the probability that the predicted class is 0 when the observed class is 0.

© Copyright 2015 All rights reserved.


Decision Tree–ROC Curve

• Receiver operating characteristics (ROC) Curve is a useful way to determine cut-off point
which maximizes sensitivity and specificity.
• Sensitivity and specificity measures are computed based on a sequence of cut-off points to
be applied to the model for predicting observations into Positive or Negative.

An overall indication of the diagnostic accuracy of


a ROC curve is the area under the curve (AUC).
AUC values between:
• 0.9-1 indicate perfect sensitivity and specificity,
• 0.8-0.9 indicate good sensitivity and specificity,
• 0.7-0.8 indicate fair sensitivity and specificity,
• 0.6-0.7 is poor
• 0.6 and below indicate by chance outcome

© Copyright 2015 All rights reserved.


Decision Tree–Gain Chart and Lift Chart

• Lift and Gain chart measure how much better one can expect to do with the model
comparing without a model.
• In contrast to the confusion/classification matrix that evaluates models on the whole
population, gain or lift chart evaluates model performance in a portion of the population.

Steps to build Gain / Lift:


1. Randomly split data into two samples (say): 80% = training sample, 20%
= validation sample.
2. Score (predicted probability) the validation sample using the response
model (training sample).
3. Rank the scored file, in descending order by probability.
4. Split the ranked file into 10 sections (deciles). Count the number of
events in each section.

Cumulative gains and lift charts are a graphical representation to depict the advantage of
using a predictive model to choose which customers to contact.

© Copyright 2015 All rights reserved.


Decision Tree–Gain Chart

• Gain at a given decile level is the ratio of cumulative number of targets (events) up to that
decile to the total number of targets (events) in the entire data set.

Source: http://www.listendata.com/2014/08/excel-template-gain-and-lift-charts.html
© Copyright 2015 All rights reserved.
Decision Tree–Lift Chart

• Lift measures how much better one can expect to do with the model comparing without a
model.
• It is the ratio of gain % to the random expectation at a given decile level. The random
expectation at the xth decile is x%.

Interpretation:
By contacting only 10% of customers, 4.5 times customers may respond.

To build Lift and Gain Chart in R. Refer to


https://heuristically.wordpress.com/2009/12/18/plot-roc-curve-lift-chart-random-forest/

© Copyright 2015 All rights reserved.


Decision Tree–Under Fitting and Over Fitting

• Model under fitting:


o Model did not learn from the training set due to less data
o Training and test error rate are large when the tree size is small

• Model overfitting:
o Model has learned too much from the data and cannot be generalized.
o As the number of nodes increases, the training error decreases but test error may
increase
o More complex trees than needed.

Model under fitting or over fitting leads to lack of generalizability and thus such decision tree
models may not be useful in correct classification on unknown cases.

© Copyright 2015 All rights reserved.


Decision Tree–Pruning

Pruning is applied to overcome the under fitting or over fitting issues in the decision tree model

Pre-pruning Post Pruning

Stop the algorithm before it becomes a fully Grow decision tree to its entirety. Trim the
grown tree: nodes of the decision tree in a bottom-up
o Stop if number of instances is less fashion
than some user specified threshold. o If generalization error improves
o Stop if expanding the current node after trimming, replace sub-tree by
does not improve impurity a leaf node.
measures (e.g., Gini or information o Class label of leaf node is
gain) by at least some threshold determined from majority class of
instances in the sub-tree
This is more efficient but less accurate.
This is more accurate but less efficient.

Misclassification error pruning: Decision tree pruning stops when number of cases in a
terminal node becomes less than a threshold

© Copyright 2015 All rights reserved.


Decision Tree in R Using an Example

© Copyright 2015 All rights reserved.


Summary

• Decision Tree is one of the most widely used data mining


Summary of the topics
technique.
covered in this lesson:
• The outcome of decision tree can be used for exploration of
data as well as to build in predictive model.
• Unlike regression and logistic regression model, there are no
statistical attributes which can suggest that the decision tree
model is good and generalizable.

© Copyright 2015 All rights reserved.


QUIZ TIME

© Copyright 2015 All rights reserved.


Quiz Question 1

Quiz 1 Which of the below is a correct statement?


Select all that apply?

a. Sensitivity is the probability that predicted class is 1 when observed class is 1.

b. Specificity is the probability that the predicted class is 1 when the observed class is 0.

c. Specificity is the probability that the predicted class is 0 when the observed class is 0.

d. Sensitivity is the probability that predicted class is 0 when observed class is 1.

© Copyright 2015 All rights reserved.


Quiz Question 1

Quiz 1 Which of the below is a correct statement?


Select all that apply?

a. Sensitivity is the probability that predicted class is 1 when observed class is 1.

b. Specificity is the probability that the predicted class is 1 when the observed class is 0.

c. Specificity is the probability that the predicted class is 0 when the observed class is 0.

d. Sensitivity is the probability that predicted class is 0 when observed class is 1.

Correct answer is: b & d are incorrect statements.


a&c

© Copyright 2015 All rights reserved.


End of Lesson04–Decision Tree Concepts

© Copyright 2015 All rights reserved.

You might also like