Lecture 5 - FP-Growth Algorithm
Lecture 5 - FP-Growth Algorithm
FP-Growth Algorithm
FP-Growth Algorithm
• Use Frequent Pattern Tree (FP-Tree) as a compact input database
representation
• Root node is labelled as null
• Each child node comprises an item name, support counter, node-link
pointer (refers to sibling node) and parent-link pointer (refers to parent
of current node)
• Header table is constructed to serve as an index to the FP-Tree:
consists of frequent item name, count of support for the item and a
pointer to the head of node-link.
• Only two scans of the database required for constructing the FP-Tree.
Root node
FP-Growth Algorithm
Phase 1: FP-Tree Construction
1. Construct a FP-Tree with a header table
Phase 2: Mining frequent itemsets
1. Mining frequent itemsets from the FP-Tree by pattern
fragment growth
Phase 1: FP-Growth Algorithm
• FP-Tree Growth Algorithm
1. Collect individual items, count their supports and obtain a
list of frequent items;
2. Sort the items in descending order of support and save
them to header table L;
3. Create the root node for tree T and mark it NULL;
4. For each transaction t do the following:
a) Sort t in the same order as in L;
b) For each item of the transaction along the order
• if a node for the item already exist, increase the
counter value by 1; otherwise create a new node for
the item and a default counter value of 1.
• Amend the node pointer reference and parent pointer
reference accordingly
FP-Growth Algorithm
• First database scan – determine frequent 1-itemset
C
TID Items Items Support F
1 2 Min sup=2 Items Support
100 1 3 4 2 3 2 3
200 2 3 5 3 3 3 3
300 1 2 3 5 4 1 5 3
400 2 5 1 2
5 3
Header Table
Items Head of Node-link
2
3
5
1
FP-Growth Algorithm
• Second scan of database – building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1
2
3
5 1:1
1
FP-Growth Algorithm
• Building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1 2:1
2
3 3:1
5 1:1
1
5:1
FP-Growth Algorithm
• Building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1 2:2
2
3 3:2
5 1:1
1
5:2
1:1
FP-Growth Algorithm
• Building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1 2:3
2
3 3:2
5
5:1
1:1
1
5:2
1:1
FP-Growth Algorithm
• FP-Growth for Frequent Itemsets
Start from the bottom of the header table towards the top, for each item in
the header table:
1. Create a frequent itemset with the item and its total support;
2. Obtain a subtree with the item as leaves;
3. If the resulting tree consists of a single path
a) Generate all combinations of the items along the path;
b) Attach the suffix to each combination to form a frequent itemset;
c) Set the support level to the lowest in the nodes involved
Otherwise,
a) Determine the conditional pattern base of the suffix.
b) Construct a conditional FP-tree for the item;
c) Repeat the process from step 2
Terminology
• Suffix
• Given a string pattern “ABGED”
• D is the suffix of ABGE, while ABGE is the prefix of D
• ED is a suffix ABG, while ABG is the prefix of ED
null 1:1
3:1 2:3
3:2
1:1
5:2
1:1
FP-Growth Algorithm
• FP-Growth (Example) null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2
1:1
null
3:1 2:3
3:2 5:1
5:2
FP-Growth Algorithm
• FP-Growth (Example) null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2
1:1
null
3:1 2:2
3:2
FP-Growth Algorithm
• FP-Growth (Example) null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2
1:1
null
2:3
F = {({1}, 2), ({1,3}, 2), ({5}, 3), ({3,5}, 2), ({2,5}, 3), ({2,3,5}, 2),
({3}, 3), ({2,3}, 2), ({2}, 3)}
FP-Growth Algorithm
• Strengths and Weaknesses
✓ No need to scan the database many times (only twice)
✓ Compact representation of the database in FP-Tree
✓ Very good when FP-tree is dense without many branches
✓ Supports for frequent itemsets are derived from the supports
in the FP-tree nodes
✓ Some tests show several folds faster than Apriori
Not very good when the database is sparse (too many
branches in the FP-tree)
Resource implications for recursive solutions
Much more complex than Apriori, and therefore quite difficult
to implement
Evaluating Association Rule Quality
• Some Evaluators
• Support, coverage, and confidence: strength of posterior association
• Support and confidence alone may not be sufficient
• Lift measure:
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑋⟹𝑌) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑋∪𝑌)
𝑙𝑖𝑓𝑡 𝑋 ⇒ 𝑌 = =
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑌) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∗𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑌)
• Interactive Discovery
• Roles of data miners in parameter setting
• Set initial minimum support and minimum confidence thresholds
• May fine-tune thresholds during the discovery phase based on
understanding the rules discovered and the number of rules found.
• Roles of data miners in constraint specification
• Left hand side item specification in X
e.g. what other items does customers buy if they buy T-shirt?
• Right hand side item specification in Y
e.g. what items lead to the sales of T-shirt?
• Item specification in both X and Y
e.g. Verify the truth of Jeans T-shirt
Association Rule Mining in Practice