Lecture 7 - Classification (Rules and Naïve Bayes)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Lecture 7

Other Classification Techniques


– Rules and Naïve Bayes

1
Overview

• Rule-based technique
• Bayesian technique
• Related Issues
• Comparison of Classification Techniques
• Classification in Practice

2
Introduction to Rule-based Technique

• General idea
• A rule format: IF condition THEN Class = c.
• Training: rules learnt from a set of training examples directly (rules) or
indirectly (extract rules from decision tree).
• Therefore, a rule model consists of a set of IF-THEN rules learnt from
the training set.
• A rule covers a record if the attribute values of the record make the
condition true.
• To classify an unseen record, rules are searched from the beginning. A
rule is fired when the rule covers the record, and the class label in the
rule is assigned to the record.

3
Rule-based Technique

• Classification rule
(A1 op1 v1)  (A2 op2 v2)  …  (Am opm vm) → Class = yi

where Aj: attribute name, vj: value of Aj, opj: comparison operator (<, >,
=, ), yi: class label.
• Rule quality measurement
| A| |A y|
coverage(r ) = accuracy(r ) =
|D| |A|
where |D|: the number of records in data set D, |A|: the number of
records covered by the rule, and |A  y|: the number of records
covered by the rule and having class label y.

4
Rule-based Technique
• Ideally, rules should be mutually exclusive and exhaustive.
• Mutually exclusive rules: no more than one rule is triggered by a
record.
• Exhaustive rules: each combination of attribute values is covered
by at least one rule.
• Mutual exclusivity is desirable to avoid conflicts in class
predictions by different rules covering the same record.
• Rules may be listed as ordered list or unordered set.
• Ordered rules: tried in sequence from the beginning of the model
and ordered by priority. Only one rule is fired.
• Unordered rules: all rules are tried and more than one rule may be
fired. Majority voting can be adopted to determine the final class
outcome.
• Default rule: () → yj as the final rule where class label yj normally
refers to the majority class of training records not yet covered by the
existing rules. 5
Rule-based Approach

• Rule Ordering Schemes


• Ordering by Rule:
• Use a rule quality measure (e.g. coverage and accuracy)
• Ordered in descending order of quality
• Advantage: best rule is applied first
• Disadvantage: later rules are difficult to understand after negating
all previous rules.
• Ordering by Class:
• Grouped and ordered by class labels
• Advantage: simpler rules and easier to understand
• Disadvantage: may not apply the “best” rule

6
Rule-based Approach

• Sequential Covering Algorithm Outline


1. Choose a default class (normally the largest class).
2. For each of the other classes, repeat the following
a) Learn a rule that covers examples of the class (Learn_One_Rule);
b) Remove the training examples covered by the rule;
c) Add the rule to the ruleset;
d) Go back to step a) until a stopping criteria is met.
3. Create the default rule and add it at the end of the ruleset.

7
Rule-based Approach
• Sequential covering algorithm (illustrated)

R1

Original Training Set

R4

R2 R3
8
Rule-based Approach
• Learn_One_Rule:
• Rule Growing (Specialization):
1. Create {}→N;
2. Select a best possible attribute-value pair and add it into antecedent;
3. Repeat Step 2 until rule quality no longer improves.

• Rule Refining (Generalization):


1. Select a record and convert it into a rule;
2. Remove one of its conjuncts so that the rule covers more examples;
3. Repeat Step 2 until the rule starts to cover negative examples.

9
Rule-based Approach

• Rule pruning
• Extracted rules may be further modified to improve their generality.
• Similar to tree pruning, rule pruning further removes or generalizes
attribute-value pairs in the antecedent part of the rules.
• Rule pruning is normally conducted using validation records.
• Rule pruning can be considered as a way to reduce the problem of
overfitting in rule-based classification.

{Outlook=sunny && Temp=hot && Humidity=high && Windy=TRUE} → N

{Outlook=sunny && Temp=hot && Humidity=high} → N

10
Rule-based Approach
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds

R1: (Give Birth = no)  (Can Fly = yes) → Birds

R2: (Give Birth = no)  (Live in Water = yes) → Fishes

R3: (Give Birth = yes)  (Blood Type = warm) → Mammals

R4: (Give Birth = no)  (Can Fly = no) → Reptiles

R5: (Live in Water = sometimes) → Amphibians


11
Introduction to Bayesian Technique

• (Naïve) Bayes classifier is a very simple yet effective classifier


• It has two (naïve) assumptions:
• All features are equally important
• All features are independent of one another
• Bayesian classification is based on the Bayes theorem: the posterior
probability of the class that a record belongs to is approximated using prior
probability drawn from the training set.
• The classification model estimates the likelihood of the record belonging to
each class.
• The class with the highest probability becomes the class label for the
record.

12
Naïve Bayes

• P(c|x) is the posterior probability of class


(c,target) given predictor (x,attributes).
• P(c) is the prior probability of class.
• P(x|c) is the likelihood which is the
probability of predictor given class.
• P(x) is the prior probability of predictor.

𝑃 𝑐|𝑥 = 𝑃 𝑥1 |𝑐 × 𝑃 𝑥2 |𝑐 × ⋯ × 𝑃 𝑥𝑛 |𝑐 × 𝑃 𝑐
13
Naïve Bayes Model
Dataset

1. Calculate the probability of each


class label.
2. Calculate the probability of each
attribute value given the class
label.

Naïve Bayes Model

14
Outlook Temperature Humidity Windy Class
overcast mild normal FALSE ?

How to classify
a record using
Naïve Bayes
model?

Outlook Temperature Humidity Windy Class


overcast mild normal FALSE P
Naïve Bayes Model

15
Related Issues

• Ensemble methods for classification


• A classification method that induces a number of classification models
and combines the class predictions from the different models (known as
ensemble methods).
• Bagging and boosting
• Training outline:
1. Decide the number of base classification models k;
2. Draw a sample of the training set;
3. Decide and build a base model from the sample;
4. Repeat step 2 and 3 until k base models are built
• Classification outline:
1. Classify the unseen record with each base model;
2. Class of the unseen record is decided by majority of the classes
assigned by the base models

16
Related Issues
• Class imbalance
• Most techniques have problems in classifying minority classes
• Accuracy does not reflect the true performance of a model
• Especially when the dataset used is imbalanced or
• If the classification rule is changed to always output a single class
• More realistic evaluation measure:
• Precision: TP/(TP+FP)
• Recall: TP/(TP+FN)
• F1-measure: 2PR/P+R
• Increasing the proportion of minority classes
• Under-sampling the majority class
• Over-sampling the minority class
• Combination of the two
• Use of ensemble methods

17
Classification in Practice

• Process of a Classification Project


1. Locate data.
2. Prepare data.
3. Choose a classification technique.
4. Construct the model and tune the model.
5. Measure its accuracy and go back to step 3 or 4 until the accuracy is
acceptable.
6. Further evaluate the model from other aspects such as complexity,
comprehensibility, etc.
7. Deploy the model and test it in real environment.
8. Further modify the model if necessary.

18
Useful references

• Chapter 7 of Data Mining Techniques and Applications


• Tan, P-N., Steinbach, M. and Kumar, V. (2006), Introduction to Data Mining,
Addison-Wesley, Chapter 5

19

You might also like