Apriori
Apriori
Apriori
Key Concepts :
• Frequent Itemsets: The sets of item which has minimum support (denoted by Li for
ith-Itemset).
• Apriori Property: Any subset of frequent itemset must be frequent.
• Join Operation: To find Lk , a set of candidate k-itemsets is generated by joining L k-1
with itself.
The Apriori Algorithm in a Nutshell
1. Find the frequent itemsets: the sets of items that have minimum support
2. A subset of a frequent itemset must also be a frequent itemset
• i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent
itemset
3. Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
4. Use the frequent itemsets to generate association rules.
The Apriori Algorithm: Example
Itemset List of Items
T1 I1, I2, I5 Consider a database, D , consisting of 9 transactions.
T2 I2,I4 • Suppose min. support count required is 2
T3 I2,I3 (i.e. min_sup = 2/9 =22 %)
T4 I1, I2, I4
• Let minimum confidence required is 70%.
T5 I1, I3
• We have to first find out the frequent itemset using
T6 I2, I3
Apriori algorithm.
T7 I1, I3
• Then, Association rules will be generated using min.
T8 I1, I2, I3,I5
support & min. confidence.
T9 I1, I2, I3
The Apriori Algorithm: Example
Step 1: Generating 1-itemset Frequent Pattern
Support Itemset Support
Itemset Count Compare candidate Count
Scan D for
support count with
count of each {I1} 6 {I1} 6
minimum support
candidate {I2} 7
{I2} 7 count
{I3} 6 {I3} 6
{I4} 2 {I4} 2
{I5} 2 {I5} 2
C1 L1
The set of frequent 1-itemsets, L1 , consists of the candidate 1- itemsets satisfying
minimum support.
Step 2: Generating 2-itemset Frequent Pattern
Itemset Itemset Support Count
{I1,I2} {I1,I2} 4
{I1,I3} {I1,I3} 4
Generate {I1,I4} Scan D for
C2 {I1,I4} 1
{I1,I5} count of
candidates each {I1,I5} 2
from L1 {I2,I3}
candidate {I2,I3} 4
{I2,I4}
{I2,I5} {I2,I4} 2
{I3,I4} {I2,I5} 2
{I3,I5} {I3,I4} 0
{I4,I5} {I3,I5} 1
{I4,I5} 0
C2 C2
Step 2: Generating 2-itemset Frequent Pattern
Itemset Support Count Itemset Support Count
{I1,I2} 4 {I1,I2} 4
{I1,I3} 4 {I1,I3} 4
Scan D for Compare
{I1,I4} 1 {I1,I5} 2
count of candidate
each {I1,I5} 2 support {I2,I3} 4
candidate {I2,I3} 4 count with {I2,I4} 2
minimum
{I2,I4} 2 support {I2,I5} 2
{I2,I5} 2 count L2
{I3,I4} 0
{I3,I5} 1
{I4,I5} 0
C2
The Apriori Algorithm : Example
Step 2: Generating 2-itemset Frequent Pattern
• To discover the set of frequent 2-itemsets, L2 , the algorithm uses L1 Join L1 to
generate a candidate set of 2-itemsets, C2.
• Next, the transactions in D are scanned and the support count for each candidate
itemset in C2 is accumulated (as shown in the middle table).
• The set of frequent 2-itemsets, L2 , is then determined, consisting of those candidate
2-itemsets in C2 having minimum support.
Note: We haven’t used Apriori Property yet
Step 3: Generating 3-itemset Frequent Pattern
C3 C3
Compare candidate
Support count with Itemset Support Count
Itemset Support Count
Minimum support {I1,I2,I3} 2
{I1,I2,I3} 2 count
{I1,I2, I5} 2
{I1,I2, I5} 2
L3
C3
The Apriori Algorithm: Example
The generation of the set of candidate 3-itemsets, C3 , involves use of the Apriori
Property.
• In order to find C3, we compute L2 Join L2.
• C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3,I5}, {I2, I4, I5}}.
• Now, Join step is complete and Prune step will be used to reduce the size of C3.
Prune step helps to avoid heavy computation due to large Ck.
The Apriori Algorithm: Example
For example , lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, 3} & {I2, I3}.
Since all 2-item subsets of {I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in C3.
• Lets take another example of {I2, I3, I5} which shows how the pruning is performed.
The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.
• BUT, {I3, I5} is not a member of L2 and hence it is not frequent violating Apriori
Property. Thus We will have to remove {I2, I3, I5} from C3.
• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join
operation for Pruning.
• Now, the transactions in D are scanned in order to determine L3, consisting of those
candidates 3-itemsets in C3 having minimum support.
Step 4: Generating 4-itemset Frequent Pattern
The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4. Although the
join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is not
frequent.
• Thus, C4 = φ , and algorithm terminates, having found all of the frequent items. This
completes our Apriori Algorithm.
Step 5: Generating Association Rules from Frequent Itemsets
Procedure:
• For each frequent itemset “l”, generate all nonempty subsets of l.
• For every nonempty subset s of l, output the rule “s (l-s)” if
support_count(l) / support_count(s) >= min_conf where min_conf is minimum confidence
threshold.
Example
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
– Lets take l = {I1,I2,I5}. – Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
Step 5: Generating Association Rules from Frequent Itemsets
– R4: I1 I2 ^ I5
• Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33%
• R4 is Rejected.
– R5: I2 I1 ^ I5
• Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29%
• R5 is Rejected.
– R6: I5 I1 ^ I2
• Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100%
• R6 is Selected.
The Apriori Algorithm: Exercise
A database has five transactions. Let min sup =
TID Item_bought 60% and min conf = 80%.
T100 {M,O,N,K,E,Y}
T200 {D,O,N,K,E,Y}
(a)Find all frequent itemsets using Apriori
T300 {M,A,K,E}
algorithm.
T400 {M,U,C,K,Y}
T500 {C,O,O,K,I,E} (b)List all of the strong association rules (with
support s and confidence c)
The Apriori Algorithm: improvement
Hash-Based Technique: This method uses a hash-based structure called a hash table for
generating the k-itemsets and its corresponding count. It uses a hash function for
generating the table.