Unit IV Da Online - PPTX 2 82
Unit IV Da Online - PPTX 2 82
Unit IV Da Online - PPTX 2 82
● Linear Models
○ Logistic Regression
○ Support Vector Machines
● Nonlinear models
○ K-nearest Neighbors (KNN)
○ Support Vector Machines (SVM)
○ Naïve Bayes
○ Decision Tree Classification
○ Random Forest Classification
Decision trees
Overview
General algorithm
Classification
Input Class
Variable Label
Predict Class:
Yes or No
Decision trees- Overview
Decision Tree-
● Uses tree structure to
specify sequences of
● Decisions &
● consequences
https://medium.com/cracking-the-data-science-interview/an-introduction-to-big-data-decision-trees-aae6a3587f59
Decision trees- Overview
Leaf Node-
Yes No Yes No
Class Label
Decision trees- Overview
● Internal nodes are the decision or test points. Each internal node refers to an input variable or an
attribute.
● The top internal node is called the Root Node.
● The decision tree in Figure is a binary tree in that each internal node has no more than two branches.
The branching of a node is referred to as a split
● The depth of a node is the minimum number of steps required to reach the node from the root.
● Leaf nodes are at the end of the last branches on the tree. They represent class labels—the outcome
of all the prior decisions.
● The path from the root to a leaf node contains a series of decisions made at various internal
nodes.
● The simplest short tree is called a decision stump,
Decision trees
Overview
General algorithm
Construct
Decision Tree
Train Model Test Model
Input Class
Variable Label
Predict Class:
Yes or No
Decision trees-General algorithm
Choose
Step
1
Most Informative
Attribute
Decision trees-General algorithm
How to Choose?
Entropy
Based ?
Method
Most Informative
Attribute
Decision trees-Example
Randomness
or Entropy Method
Uncertainty
Measured
Measures
Measures
the
impurity of
Information the purity of
an attribute
Entropy Gain an attribute
Decision trees-General algorithm
Entropy
Equation
Entropy = 1
Entropy = 0 when probability of all class labels
when all P(x) is 0 or 1 is 50/50
Decision trees-General algorithm
Information Gain of
Attribute A
Training Dataset
Decision Tree Example
● Entropy(S) => I(Yes,No) => (9,5) = - 9/14 Log2 (9/14) - 5/14 Log 2(5/14)
Entropy(S) = 0.94
Decision Tree Example
Age
<=30
(5)
Yes- 2
No-3
Decision Tree Example
Age
IG(S,Age) = 0.26
Highest Gain is given by Age
IG(S,Student) = 0.15 So
Age Attribute will Come at
Root
IG(S,Credit) = 0.048
IG(S,Income) = 0.029
Decision Tree Example
Age
Age
Yes
Decision Tree Example
Age
Student
Yes
Yes No
Yes No
Decision Tree Example
Age
Student Credit
Yes
Yes No No Yes
Decision trees
Overview
General algorithm
Decision Tree
Lorem Ipsum
Algorithm
1 ID3 (A, P, T)
2 if T ∈ ∅
3 return ∅
4 if all records in T have the same value for P
5 return a single node with that value
6 if A ∈ ∅
7 return a single node with the most frequent value of P in T
8 Compute information gain for each attribute in A relative to T
9 Pick attribute D with the largest gain
10 Let {d1 , d2 , ….dm } be the values of attribute D
11 Partition T into { T1, T2, ….Tm} according to the values of D
12 return a tree with root D and branches labeled d1, d2, ….dm"going respectively to trees ID3(A-{D}, P, T1),
ID3(A-{D}, P,T2 ), … …...ID3(A-{D}, P, Tm)
C4.5
● The C4.5 algorithm introduces a number of improvements over the original ID3 algorithm.
● The C4.5 algorithm can handle missing data.
● If the training records contain unknown attribute values, the C4.5 evaluates the gain for an attribute by
considering only the records where the attribute is defined.
● Both categorical and continuous attributes are supported by C4.5.
● Values of a continuous variable are sorted and partitioned.
● For the corresponding records of each partition, the gain is calculated, and the partition that maximizes
the gain is chosen for the next split.
● The ID3 algorithm may construct a deep and complex tree, which would cause overfitting
● The C4.5 algorithm addresses the overfitting problem in ID3 by using a bottom-up technique called
pruning to simplify the tree by removing the least visited nodes and branches.
CART
● CART (or Classification And Regression Trees) is often used as a generic acronym for the decision
tree, although it is a specific implementation.
● Similar to C4.5, CART can handle continuous attributes.
● Whereas C4.5 uses entropy based criteria to rank tests, CART uses the Gini diversity index defined
in Equation
●
● Whereas C4.5 employs stopping rules, CART constructs a sequence of subtrees, uses
cross-validation to estimate the misclassification cost of each subtree, and chooses the one with the
lowest cost.
Decision trees
Overview
General algorithm
Bayes‟ Algorithm
Smoothing
Diagnostics
Naïve Bayes Applications
● Text Classification
○ Spam Filtering
○ Document classification based on Topic (Technology, Sports etc)
○ Sentiment Analysis
○ Information Retrieval
● Speech Recognition
● Image Classification (e.g. Face detection)
● Target Marketing
● Medical Field
○ Disease detection
○ Treatment Detection
Naïve Bayes- Overview
supervised Learning
Naïve Bayes- Overview
Bay’s Theorem
where
C is the class label
A is the observed attributes
Naïve Bayes- Bay’s Theorem
Class Prior
Likelihood
Probability
Naive Bayes classifier calculates the probability of an event in the following steps:
Step 1: Calculate the prior probability for given class labels
Step 2: Find Likelihood probability with each attribute for each class
Step 3: Put these value in Bayes Formula and calculate posterior probability.
Step 4: See which class has a higher probability, given the input belongs to the
higher probability class.
NAIVE BAYES EXAMPLE
Outlook Temp Humidity Windy Play Golf
Rainy Hot High F NO
Class Label Rainy Hot High T NO
Play Golf
14
Yes No
9 5
9/14 5/14
NAIVE BAY’S EXAMPLE
Frequency Table
Play Golf
Yes No
Sunny 3 2
Outlook
Overcast 4 0
Rainy 2 3
Total 9 5
NAIVE BAYES EXAMPLE
Play Golf
Play Golf
Yes No
Yes No
Sunny 3 2
Outlook Sunny 3/9 2/5
Overcast 4 0 Outlook
Overcast 4/9 0/5
Rainy 2 3
Rainy 2/9 3/5
Total 9 5
NAIVE BAYES EXAMPLE-
Frequency & Likelihood Table
Play Golf Play Golf
Yes No Yes No
Yes No Yes No
P (No | X) = 0.020
Actual Value
Positive(1) Negative(0)
Predicted Positive(1) True Positive False Positive
Value Negative(0) False Negative True Negative
Diagnostics of Classifier
Actual Value
Subscribed Non Subscribed
Predicted Subscribed True Positive False Positive
Value Non Subscribed False Negative True Negative
Diagnostics of Classifier
Actual Value
Subscribed Non Subscribed
Predicted Subscribed 3 2
Value Non Subscribed 8 87
Diagnostics of Classifier
Actual Value
Subscribed Non Subscribed
Predicted Subscribed TP = 3 FP = 2
Value Non Subscribed FN = 8 TN = 87
Diagnostics of Classifier
Actual Value
Subscribed Non Subscribed Total
Predicted Subscribed 3 2 5
Value Non Subscribed 8 87 95
Total 11 89 100
Diagnostics of Classifier
Performance
Measured
Additional
Classifier Method
Bagging
● Bagging (or bootstrap aggregating) uses the bootstrap technique that repeatedly
samples with replacement from a dataset according to a uniform probability
distribution.
● “With replacement” means that when a sample is selected for a training or testing set,
the sample is still kept in the dataset and may be selected again.
● Because the sampling is with replacement, some samples may appear several times
in a training or testing set, whereas others may be absent.
● A model or base classifier is trained separately on each bootstrap sample, and a test
sample is assigned to the class that received the highest number of votes.
Additional Classifier Method
Boosting
● Similar to bagging, boosting (or AdaBoost) uses votes for classification to combine
the output of individual models.
● In addition, it combines models of the same type.
● However, boosting is an iterative procedure where a new model is influenced by the
performances of those models built previously.
● Furthermore, boosting assigns a weight to each training sample that reflects its
importance, and the weight may adaptively change at the end of each boosting round.
● Bagging and boosting have been shown to have better performances than a decision
tree.
Additional Classifier Method
Random Forest
Support Vector
Machine (SVM)
● SVM is another common classification method that combines linear models with
instance-based learning techniques.
● Support vector machines select a small number of critical boundary instances called
support vectors from each class and build a linear decision function that separates
them as widely as possible.
● SVM by default can efficiently perform linear classifications and can be configured to
perform nonlinear classifications as well.
How to choose a suitable classifier among
Decision trees, naïve Bayes, & logistic regression