Association Rule Mining: Iyad Batal
Association Rule Mining: Iyad Batal
Iyad Batal
Association Rules
Example for relational data
Rule: Smoke =T ∧ Family history = T ⇒ Lung cancer=T
sup (Smoke =T ∧ Family history = T ∧ Lung cancer=T )= 60/200=30%
conf (Smoke =T ∧ Family history = T ⇒ Lung cancer=T)= 60/100=60%
Iyad Batal
Frequent Pattern Mining
• Scalable mining methods: Three major approaches:
Iyad Batal
Apriori
• The Apriori property:
– Any subset of a frequent pattern must be frequent.
– If {beer, chips, nuts} is frequent, so is {beer, chips}, i.e., every
transaction having {beer, chips, nuts} also contains {beer, chips}.
Iyad Batal
FP-growth
• The FP-growth algorithm: mining frequent patterns without candidate
generation [Han, Pei & Yin 2000]
• Compress a large database into a compact Frequent-Pattern tree (FP-
tree) structure
– highly condensed, but complete for frequent pattern mining
– avoid costly database scans
• Develop an efficient, FP-tree-based frequent pattern mining method
– A divide-and-conquer methodology: decompose mining tasks into
smaller ones
– Avoid candidate generation: sub-database test only!
Iyad Batal
FP-growth
Constructing the FP-tree
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Steps: Header Table
Iyad Batal
FP-growth
Step 1: From FP-tree to Conditional Pattern Base
• Starting at the frequent header table in the FP-tree
• Traverse the FP-tree by following the link of each frequent item
• Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
Header Table {}
Conditional pattern bases
Item frequency head f:4 c:1 item cond. pattern base
f 4
c 4 c f:3
c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1 Iyad Batal
FP-growth
Step 2: Construct Conditional FP-tree
• Start from the end of the list
• For each pattern-base
– Accumulate the count for each item in the base
– Construct the FP-tree for the frequent items of the pattern base
• Example: Here we are mining for pattern m, min_sup=3
{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
a 3 m,
b 3 a:3 p:1 f:3 fm, cm, am,
m 3 fcm, fam, cam,
p 3 m:2 b:1 c:3 fcam
Iyad Batal
FP-growth
FP-growth vs. Apriori: Scalability With the Support Threshold
100
90 D1 FP-grow th runtime
D1 Apriori runtime
80
70
Run time(sec.)
60
50
40
30
20
10
0
0 0.5 1 1.5 2 2.5 3
Support threshold(%)
Iyad Batal
Correlation analysis
• Association rule mining often generates a huge number of
rules, but a majority of them either are redundant or do not
reflect the true correlation relationship among data objects.
Iyad Batal
Correlation analysis
• play basketball eat cereal [40%, 66.7%] is misleading
– The overall % of students eating cereal is 75% > 66.7%.
• play basketball not eat cereal [20%, 33.3%] is more accurate,
although with lower support and confidence
Contingency table
Basketball Not basketball Sum (row)
Cereal 2000 (40%) 1750 (35%) 3750 (75%)
Not cereal 1000 (20%) 250 (5%) 1250 (25%)
Sum(col.) 3000 (60%) 2000 (40%) 5000 (100%)
Iyad Batal
Correlation analysis
The lift score
P( A B) P( B | A)
lift ( A B)
P( A) P( B) P( B)
• Lift = 1 A and B are independent
• Lift > 1 A and B are positively correlated
• Lift < 1 A and B are negatively correlated.
(O( r ) E[r ]) 2
x 2
E[r ]
• Constraint-based mining
– User flexibility: provides constraints on what to be mined
• Specify the task relevant data, the relevant attributes, rule
templates, additional constraints…
– System optimization: explores such constraints for efficient
mining—constraint-based mining.
Iyad Batal
Constraint-based Mining
• Anti-monotonic constraints are very important because they can
greatly speed up the mining process.
Iyad Batal
Association Rule Mining
• Association Rules and Frequent Patterns
• Frequent Pattern Mining Algorithms
– Apriori
– FP-growth
• Correlation Analysis
• Constraint-based Mining
• Using Frequent Patterns for Classification
– Associative Classification (rule-based classification)
– Frequent Pattern-based Classification
Iyad Batal
Associative classification
• Associative classification: build a rule-based classifier from
association rules.
• This approach overcomes some limitations of greedy methods (e.g.
decision-tree, sequential covering algorithms), which considers only
one attribute at a time (found to be more accurate than C4.5).
• Build class association rules:
– Association rules in general can have any number of items in the
consequent.
– Class association rules set the consequent to be the class label.
• Example: Age=youth ∧ Credit=OK ⇒ buys_computer=yes
[sup=20%, conf=90%]
Iyad Batal
Associative classification
CBA
• CBA: Classification-Based Association [Liu et al, 1998]
• Use the Apriori algorithm to mine the class association rules.
• Classification:
– Organize the rules according to their confidence and support.
– classify a new example x by the first rule satisfying x.
– Contains a default rule (with lowest precedence).
Iyad Batal
Associative classification
CMAR
• CMAR (Classification based on Multiple Association rules) [Li et al 2001]
• Use the FP-growth algorithm to mine the class association rules.
• Employs the CR_tree structure (prefix tree for indexing the rules) to
efficiently store and retrieve rules.
• Apply rule pruning whenever a rule is inserted in the tree:
– If R1 is more general than R2 and conf(R1)>conf(R2): R2 is pruned
– All rules for which the antecedent and class are not positively
correlated (χ2 test) are also pruned.
• CMAR considers multiple rules when classifying an instance and use a
weighted measure to find the strongest class.
• CMAR is slightly more accurate and more efficient than CBA
Iyad Batal
Associative classification
Harmony
• Drawback of CBA and CMAR is that the number of rules can be
extremely large.
• Harmony [Wang et al, 2005] adopts an instance-centric approach:
– Find the highest confidence rule for each training instance.
– Build the classification model from the union of these rules.
• Use the FP-growth algorithm to mine the rules.
• Efficient mining:
– Naïve way: mine all frequent patterns and then extract the
highest confidence rule for each instance.
– Harmony employs efficient pruning methods to accelerate the
rule discovery: the pruning methods are incorporated within the
FP-growth algorithm.
Iyad Batal
Frequent pattern-based classification
• The classification model is built in the feature space of single
features as well as frequent patterns, i.e. map the data to a higher
dimensional space.
• Feature combination can capture more underlying semantics than
single features.
• Example: word phrases can improve the accuracy of document
classification.
• FP-based classification been applied to many problems:
– Graph classification
– Time series classification
– Protein classification
– Text classification
Iyad Batal
Frequent pattern-based classification
• Naïve solution: given a dataset with n items (attribute-value),
enumerate all 2n items and use them for classification.
• Problems:
– Computationally infeasible.
– Overfitting the classifier.
• Solution: use only frequent patterns. Why?
• [Cheng et al. 2007] showed that:
– low-support features are not very useful for classification.
– The discriminative power of a pattern is closely related to its
support.
– They derived an upper bound for information gain as a function
of the support.
Iyad Batal
Frequent pattern-based classification
IG upper bound as a function of the support
Iyad Batal
Tree-based frequent patterns
3. The discriminative power of each pattern is evaluated against the
complete dataset (second phase), but not on subset of examples that
the other chosen patterns fail to predict well.
After choosing the solid line, the dashed line makes the groups
purer (cannot be chosen by the batch mode)
Iyad Batal
Tree-based frequent patterns
• Method: Construct a decision tree using frequent patterns:
– Apply frequent pattern on the whole data
– Select the best frequent pattern P to divide the data into two sets:
one containing P and the other not.
– Repeat the procedure on each subset until a stopping condition is
satisfied.
• Decreasing the support with smaller partitions makes the algorithm
able to mine patterns with very low global support.
Iyad Batal