0% found this document useful (0 votes)

26 views

Lecture 5 - FP-Growth Algorithm

The document summarizes the FP-Growth algorithm for mining frequent itemsets. It describes how FP-Growth uses an FP-tree to compactly represent the transaction database in two scans. It then mines frequent itemsets from the FP-tree without candidate generation by recursively checking conditional patterns. The algorithm has strengths of being more efficient than Apriori when the FP-tree is dense but struggles with sparse databases with many branches.

Uploaded by

johndeuterok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Lecture 5 - FP-Growth Algorithm

Uploaded by

johndeuterok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Lecture 5:

FP-Growth Algorithm
FP-Growth Algorithm
• Use Frequent Pattern Tree (FP-Tree) as a compact input database
representation
• Root node is labelled as null
• Each child node comprises an item name, support counter, node-link
pointer (refers to sibling node) and parent-link pointer (refers to parent
of current node)
• Header table is constructed to serve as an index to the FP-Tree:
consists of frequent item name, count of support for the item and a
pointer to the head of node-link.

• Only two scans of the database required for constructing the FP-Tree.

• Frequent itemsets are derived from the FP-Tree.

A review of a tree structure

Root node
FP-Growth Algorithm
Phase 1: FP-Tree Construction
1. Construct a FP-Tree with a header table
Phase 2: Mining frequent itemsets
1. Mining frequent itemsets from the FP-Tree by pattern
fragment growth
Phase 1: FP-Growth Algorithm
• FP-Tree Growth Algorithm
1. Collect individual items, count their supports and obtain a
list of frequent items;
2. Sort the items in descending order of support and save
them to header table L;
3. Create the root node for tree T and mark it NULL;
4. For each transaction t do the following:
a) Sort t in the same order as in L;
b) For each item of the transaction along the order
• if a node for the item already exist, increase the
counter value by 1; otherwise create a new node for
the item and a default counter value of 1.
• Amend the node pointer reference and parent pointer
reference accordingly
FP-Growth Algorithm
• First database scan – determine frequent 1-itemset
C
TID Items Items Support F
1 2 Min sup=2 Items Support
100 1 3 4 2 3 2 3
200 2 3 5 3 3 3 3
300 1 2 3 5 4 1 5 3
400 2 5 1 2
5 3

Header Table
Items Head of Node-link
2
3
5
1
FP-Growth Algorithm
• Second scan of database – building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1
2
3
5 1:1
1
FP-Growth Algorithm
• Building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1 2:1
2
3 3:1
5 1:1
1
5:1
FP-Growth Algorithm
• Building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1 2:2
2
3 3:2
5 1:1
1
5:2

1:1
FP-Growth Algorithm
• Building header table
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
null
Header Table
Items Head of Node-link 3:1 2:3
2
3 3:2
5
5:1
1:1
1
5:2

1:1
FP-Growth Algorithm
• FP-Growth for Frequent Itemsets
Start from the bottom of the header table towards the top, for each item in
the header table:
1. Create a frequent itemset with the item and its total support;
2. Obtain a subtree with the item as leaves;
3. If the resulting tree consists of a single path
a) Generate all combinations of the items along the path;
b) Attach the suffix to each combination to form a frequent itemset;
c) Set the support level to the lowest in the nodes involved
Otherwise,
a) Determine the conditional pattern base of the suffix.
b) Construct a conditional FP-tree for the item;
c) Repeat the process from step 2
Terminology
• Suffix
• Given a string pattern “ABGED”
• D is the suffix of ABGE, while ABGE is the prefix of D
• ED is a suffix ABG, while ABG is the prefix of ED

• Conditional pattern base – set of prefix paths in the FP-Tree co-

occurring with the suffix pattern.
Terminology

• Conditional FP-Tree – is a FP-Tree constructed from the conditional

pattern base.
FP-Growth Algorithm
• FP-Growth (Example)
null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2

null 1:1

3:1 2:3

3:2
1:1
5:2

1:1
FP-Growth Algorithm
• FP-Growth (Example) null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2

1:1
null

3:1 2:3

3:2 5:1

5:2
FP-Growth Algorithm
• FP-Growth (Example) null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2

1:1
null

3:1 2:2

3:2
FP-Growth Algorithm
• FP-Growth (Example) null
Header Table
Items Head of Node-link 3:1 2:3
2
3
5
3:2 5:1
1:1
1
5:2

1:1

null

2:3

F = {({1}, 2), ({1,3}, 2), ({5}, 3), ({3,5}, 2), ({2,5}, 3), ({2,3,5}, 2),
({3}, 3), ({2,3}, 2), ({2}, 3)}
FP-Growth Algorithm
• Strengths and Weaknesses
✓ No need to scan the database many times (only twice)
✓ Compact representation of the database in FP-Tree
✓ Very good when FP-tree is dense without many branches
✓ Supports for frequent itemsets are derived from the supports
in the FP-tree nodes
✓ Some tests show several folds faster than Apriori
 Not very good when the database is sparse (too many
branches in the FP-tree)
 Resource implications for recursive solutions
 Much more complex than Apriori, and therefore quite difficult
to implement
Evaluating Association Rule Quality
• Some Evaluators
• Support, coverage, and confidence: strength of posterior association
• Support and confidence alone may not be sufficient
• Lift measure:

𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑋⟹𝑌) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑋∪𝑌)
𝑙𝑖𝑓𝑡 𝑋 ⇒ 𝑌 = =
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑌) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∗𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑌)

where support is represented in fraction

• Lift = 1: if X and Y are independent, Lift > 1: if occurrence of Y is
related to occurrence of X, Lift < 1: if occurrence of Y is related to
absence of X.
e.g. {2,3}  {5} cf = 100% and sup({5}) = 75%
Lift({2,3}  {5}) = 100/75  1.33
Association Rule Mining in Practice

• Process of an Association Mining Project

1. Objective Definition (association between what?)
2. Data preparation (getting the relevant attributes)
3. Identifying transactions and items
4. Setting parameters
5. Conduct the mining
6. Evaluation of association rule quality and strength
7. Rule filtering and selection (real problem at present)
Association Rule Mining in Practice

• Making Association Rule Mining Efficient

• Exponential time complexity
• Combinatorial nature of the problem: big search space
• Algorithm improvement: limited success so far
• Implementation techniques: limited scale of improvement through use
of good data structures
• Sampling the database: reducing the number of transactions to be
used for discovery of rules
• Parallel processing: hardware solution – sharing the workload among
multiple processors
Association Rule Mining in Practice

• Interactive Discovery
• Roles of data miners in parameter setting
• Set initial minimum support and minimum confidence thresholds
• May fine-tune thresholds during the discovery phase based on
understanding the rules discovered and the number of rules found.
• Roles of data miners in constraint specification
• Left hand side item specification in X
e.g. what other items does customers buy if they buy T-shirt?
• Right hand side item specification in Y
e.g. what items lead to the sales of T-shirt?
• Item specification in both X and Y
e.g. Verify the truth of Jeans  T-shirt
Association Rule Mining in Practice

• Various Boolean Rules

• Rules leading to course of action (e.g. banana  tomato)
• Rules having no course of action (i.e. rules with inexplicable
associations – meanings that are not easy to explain)
• Rules that state known knowledge (e.g. link_to_homepage 
link_to_main_picture: the main picture is ON the homepage)
• Anonymous rules (no transaction IDs identified) vs. signature rules
(transaction IDs identified)
Summary

• Boolean association rules state associations between the presence of

certain items with the presence of some other items
• Association rules are measured and evaluated in terms of support,
confidence and lift
• The most significant part of association rule mining is the discovery of
frequent itemsets
• Apriori algorithm for finding frequent itemsets follows a create-and-test
greedy approach. FP-growth algorithm offers an alternative by traversing a
FP-tree without generating candidate itemsets
• Apriori approach is also used for generating rule expressions
• Association rule mining faces with practical problems and issues
Useful Further References

• Read Chapter 8 of Data Mining Techniques and Applications

• Tan, P-N., Steinbach, M. and Kumar, V. (2006), Introduction to Data Mining,
Addison-Wesley, Chapters 6
• Han, J. and Kamber, M. (2006), Data Mining: Concepts and Techniques, 2nd
ed. Morgan Kaufmann, Chapter 5

Leetcode Python Solutions
86% (7)
Leetcode Python Solutions
226 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
C++ Programming Exercises
No ratings yet
C++ Programming Exercises
6 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
U3 - FP Trees - 5th Sem - DS
No ratings yet
U3 - FP Trees - 5th Sem - DS
9 pages
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
No ratings yet
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
13 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
Association Rule Mining: FP Growth
No ratings yet
Association Rule Mining: FP Growth
22 pages
fpgrowth
No ratings yet
fpgrowth
11 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Unit4 2 Association Rules FP Growth
No ratings yet
Unit4 2 Association Rules FP Growth
33 pages
Association Rules FP Tree1
No ratings yet
Association Rules FP Tree1
31 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
FP Growth
No ratings yet
FP Growth
21 pages
FP-Growth Algorithm (1)
No ratings yet
FP-Growth Algorithm (1)
5 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
FP GROWTH ALG
No ratings yet
FP GROWTH ALG
17 pages
18-FP-Growth algorithm-12-02-2025
No ratings yet
18-FP-Growth algorithm-12-02-2025
24 pages
fp-tree
No ratings yet
fp-tree
37 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
2 unit dm k raj kuamr
No ratings yet
2 unit dm k raj kuamr
26 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
Chapter 5 Association Rules FP Tree
No ratings yet
Chapter 5 Association Rules FP Tree
26 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
FP Tree
No ratings yet
FP Tree
42 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Association Rule Mining3
No ratings yet
Association Rule Mining3
13 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
Mtech Project Seminar1
No ratings yet
Mtech Project Seminar1
36 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
03 Pre Processing
No ratings yet
03 Pre Processing
20 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Q) FP Growth Algorithm?: This Algorithm Works As Follows
No ratings yet
Q) FP Growth Algorithm?: This Algorithm Works As Follows
3 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
RDataMining Slides Association Rules PDF
No ratings yet
RDataMining Slides Association Rules PDF
75 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
Association Rule: Frequent Pattern Approach
No ratings yet
Association Rule: Frequent Pattern Approach
16 pages
FPgrowth
No ratings yet
FPgrowth
2 pages
FP-Tree Growth Algorithm
No ratings yet
FP-Tree Growth Algorithm
15 pages
An Improvement of FP-Growth Association Rule Minin
No ratings yet
An Improvement of FP-Growth Association Rule Minin
7 pages
CK: Candidate Itemset of Size K LK: Frequent Itemset of Size K L1 (Frequent Items) Ck+1 Candidates Generated From LK
No ratings yet
CK: Candidate Itemset of Size K LK: Frequent Itemset of Size K L1 (Frequent Items) Ck+1 Candidates Generated From LK
7 pages
Data Wirehose and Mining 3
No ratings yet
Data Wirehose and Mining 3
15 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
No ratings yet
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
21 pages
Chapter4
No ratings yet
Chapter4
32 pages
Association Rule Mining: Iyad Batal
No ratings yet
Association Rule Mining: Iyad Batal
37 pages
Association Rule Mining: Iyad Batal
No ratings yet
Association Rule Mining: Iyad Batal
37 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
VDM Manual
No ratings yet
VDM Manual
192 pages
L03 Logic Overview-Q
No ratings yet
L03 Logic Overview-Q
35 pages
L04 Type Definitions-Q
No ratings yet
L04 Type Definitions-Q
48 pages
PCM 600
No ratings yet
PCM 600
90 pages
Introducing An Unsupervised Automated Solution For Root Cause Diagnosis in Mobile Networks
No ratings yet
Introducing An Unsupervised Automated Solution For Root Cause Diagnosis in Mobile Networks
16 pages
CSC 221
No ratings yet
CSC 221
3 pages
Artificial Intelligence and Neural Networks (NEW)
No ratings yet
Artificial Intelligence and Neural Networks (NEW)
25 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
77 pages
Haskell Exam
No ratings yet
Haskell Exam
11 pages
How To Work With SAP Crystal Reports in SAP Business One 9.1
No ratings yet
How To Work With SAP Crystal Reports in SAP Business One 9.1
86 pages
Lecture Notes On Binary Decision Diagrams
No ratings yet
Lecture Notes On Binary Decision Diagrams
15 pages
Leetcode DSA Sheet by Fraz
No ratings yet
Leetcode DSA Sheet by Fraz
20 pages
RND Testing
No ratings yet
RND Testing
49 pages
DSA IMP
No ratings yet
DSA IMP
18 pages
Binary Search Tree (BST) Java
100% (1)
Binary Search Tree (BST) Java
40 pages
Information Security Management and Modelling
No ratings yet
Information Security Management and Modelling
10 pages
Data Structure Syllabus
100% (1)
Data Structure Syllabus
2 pages
Advanced Data Structure and C++ Programming
No ratings yet
Advanced Data Structure and C++ Programming
53 pages
Ep 95 0300 - (Hemp)
100% (2)
Ep 95 0300 - (Hemp)
86 pages
B Trees
No ratings yet
B Trees
25 pages
C Questions and Answer-Libre
0% (1)
C Questions and Answer-Libre
215 pages
Register No: 1108201040 Name:: CS8381 - Data Structure Laboratory Page No
No ratings yet
Register No: 1108201040 Name:: CS8381 - Data Structure Laboratory Page No
3 pages
Theory Paper: Arid Agriculture University, Rawalpindi
No ratings yet
Theory Paper: Arid Agriculture University, Rawalpindi
14 pages
DS Unit 3
No ratings yet
DS Unit 3
52 pages
Binary Tree
No ratings yet
Binary Tree
49 pages
Blind 75 PDF
No ratings yet
Blind 75 PDF
129 pages
DS Lab 13 - Heaps
No ratings yet
DS Lab 13 - Heaps
9 pages
Weight-Biased Leftist Heaps Advanced)
No ratings yet
Weight-Biased Leftist Heaps Advanced)
64 pages
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
No ratings yet
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
17 pages
Data Structure (DS) Solved Mcqs Set 2
No ratings yet
Data Structure (DS) Solved Mcqs Set 2
6 pages
Understanding Indexes: by Tim Gorman, Sagelogix, Inc
No ratings yet
Understanding Indexes: by Tim Gorman, Sagelogix, Inc
10 pages