Lecture 7 - Classification (Rules and Naïve Bayes)
Lecture 7 - Classification (Rules and Naïve Bayes)
Lecture 7 - Classification (Rules and Naïve Bayes)
1
Overview
• Rule-based technique
• Bayesian technique
• Related Issues
• Comparison of Classification Techniques
• Classification in Practice
2
Introduction to Rule-based Technique
• General idea
• A rule format: IF condition THEN Class = c.
• Training: rules learnt from a set of training examples directly (rules) or
indirectly (extract rules from decision tree).
• Therefore, a rule model consists of a set of IF-THEN rules learnt from
the training set.
• A rule covers a record if the attribute values of the record make the
condition true.
• To classify an unseen record, rules are searched from the beginning. A
rule is fired when the rule covers the record, and the class label in the
rule is assigned to the record.
3
Rule-based Technique
• Classification rule
(A1 op1 v1) (A2 op2 v2) … (Am opm vm) → Class = yi
where Aj: attribute name, vj: value of Aj, opj: comparison operator (<, >,
=, ), yi: class label.
• Rule quality measurement
| A| |A y|
coverage(r ) = accuracy(r ) =
|D| |A|
where |D|: the number of records in data set D, |A|: the number of
records covered by the rule, and |A y|: the number of records
covered by the rule and having class label y.
4
Rule-based Technique
• Ideally, rules should be mutually exclusive and exhaustive.
• Mutually exclusive rules: no more than one rule is triggered by a
record.
• Exhaustive rules: each combination of attribute values is covered
by at least one rule.
• Mutual exclusivity is desirable to avoid conflicts in class
predictions by different rules covering the same record.
• Rules may be listed as ordered list or unordered set.
• Ordered rules: tried in sequence from the beginning of the model
and ordered by priority. Only one rule is fired.
• Unordered rules: all rules are tried and more than one rule may be
fired. Majority voting can be adopted to determine the final class
outcome.
• Default rule: () → yj as the final rule where class label yj normally
refers to the majority class of training records not yet covered by the
existing rules. 5
Rule-based Approach
6
Rule-based Approach
7
Rule-based Approach
• Sequential covering algorithm (illustrated)
R1
R4
R2 R3
8
Rule-based Approach
• Learn_One_Rule:
• Rule Growing (Specialization):
1. Create {}→N;
2. Select a best possible attribute-value pair and add it into antecedent;
3. Repeat Step 2 until rule quality no longer improves.
9
Rule-based Approach
• Rule pruning
• Extracted rules may be further modified to improve their generality.
• Similar to tree pruning, rule pruning further removes or generalizes
attribute-value pairs in the antecedent part of the rules.
• Rule pruning is normally conducted using validation records.
• Rule pruning can be considered as a way to reduce the problem of
overfitting in rule-based classification.
10
Rule-based Approach
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
12
Naïve Bayes
𝑃 𝑐|𝑥 = 𝑃 𝑥1 |𝑐 × 𝑃 𝑥2 |𝑐 × ⋯ × 𝑃 𝑥𝑛 |𝑐 × 𝑃 𝑐
13
Naïve Bayes Model
Dataset
14
Outlook Temperature Humidity Windy Class
overcast mild normal FALSE ?
How to classify
a record using
Naïve Bayes
model?
15
Related Issues
16
Related Issues
• Class imbalance
• Most techniques have problems in classifying minority classes
• Accuracy does not reflect the true performance of a model
• Especially when the dataset used is imbalanced or
• If the classification rule is changed to always output a single class
• More realistic evaluation measure:
• Precision: TP/(TP+FP)
• Recall: TP/(TP+FN)
• F1-measure: 2PR/P+R
• Increasing the proportion of minority classes
• Under-sampling the majority class
• Over-sampling the minority class
• Combination of the two
• Use of ensemble methods
17
Classification in Practice
18
Useful references
19