CH 03 Frequent Pattern Mining 2021
CH 03 Frequent Pattern Mining 2021
CH 03 Frequent Pattern Mining 2021
• Given:
― a set I of all the items;
― a database D of transactions;
― minimum support s;
― minimum confidence c;
• Find:
― all association rules X Y with a minimum
support s and confidence c.
Problem Decomposition
L1 = {frequent items};
for (k = 1; Lk !=; k++) do
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in
Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
return k Lk;
The Apriori Algorithm — Example 1
Database D
L = { O, K, E}
1 A,B,D
2 A,D
3 A,C
4 B,D,E,F
Apriori Example 4
Tid items
Find frequent item set and
1 ABD strong association rule
2 BCD
3 AB
4 BD
5 ABC
• Generating C4 from L3
– abcd from abc and abd
– acde from acd and ace
• Pruning:
– acde is removed because ade is not in L3
• C4={abcd}
Example of Discovering Rules
Let use consider the 3-itemset {I1, I2, I5}:
I1 I2 I5
I1 I5 I2
I2 I5 I1
I1 I2 I5
I2 I1 I5
I5 I1 I2
Discovering Rules
BUT HOW?
FP Growth
• Simply a two step procedure
– Step 1: Build a compact data structure called
the FP-tree
• Built using 2 passes over the data-set.
– Step 2: Extracts frequent item sets directly
from the FP-tree
null
I2:1
I1:1
I5:1
For Transaction:
I2,I4
null
I2:2
I1:1
I4:1
I5:1
For Transaction:
I2,I3
null
I2:3
I1:1 I4:1
I3:1
I5:1
For Transaction:
I2,I1,I4
null
I2:4
I1:2 I4:1
I3:1
I5:1 I4:1
For Transaction:
I1,I3
null
I2:4 I1:1
I3:1
I1:2 I3:1 I4:1
I5:1 I4:1
For Transaction:
I2,I3
null
I2:5 I1:1
I3:1
I1:2 I3:2 I4:1
I5:1 I4:1
For Transaction:
I1,I3
null
I2:5 I1:2
I3:2
I1:2 I3:2 I4:1
I5:1 I4:1
For Transaction:
I2,I1,I3,I5
null
I2:6 I1:2
I3:2
I1:3 I3:2 I4:1
I5:1
For Transaction:
I2,I1,I3
null
I2:7 I1:2
I3:2
I1:4 I3:2 I4:1
I5:1
To facilitate tree traversal, an
item header table is built so
that each item points to its null
occurrences in the tree via a
chain of node-links.
I2 7 I1:2
I2:7
I1 6
I3 6
I4 2
I4:1 I3:2
I5 2 I1:4 I3:2
I3:2
I5:1 I4:1
I5:1
FP Growth
• FP Tree Construction Over!!
Now we need to find conditional pattern
base and Conditional FP Tree for each
item
Frequent Patters Generated
Item Conditional Pattern Conditional FP-tree Frequent Pattern Generated
Base
I3 {I2,I1:2},{I2:2},{I1:2} {I2:4,I1:2},{I1:2} {I2, I3: 4}, {I1, I3: 4}, {I2, I1,
I3: 2}
I2
Ignore as no Branch
Introduction to Market Basket Analysis
• Def:Market Basket Analysis (Association
Analysis) is a mathematical modeling
technique based upon the theory that if you
buy a certain group of items, you are likely to
buy another group of items.
• Data constraint
– specify the set of task relevant data
— using SQL-like queries
• find product pairs sold together in stores in Chicago
in Dec.’02
• Dimension/level constraint
– Specify the desired dimension of data, or levels of
the concept hierarchies to be mined
– in relevance to region, price, brand, customer
category
• Rule (or pattern) constraint
– Specify the form of rule to be mined (metarule-
rule template)
– small sales (price < $10) triggers big sales (sum >
$200)
• Interestingness constraint
– Specify thresholds such as support, confidence
– strong rules: min_support 3%, min_confidence
60%
Mining Various Kinds of Association
Rules
Comp Printer TV