0% found this document useful (0 votes)

12 views

FP Tree

Uploaded by

notesbook14925

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

FP Tree

Uploaded by

notesbook14925

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Mining Frequent Patterns

without Candidate Generation

Association Rule Mining
 FP-growth is an improved version of the
Apriori Algorithm which is widely used for
frequent pattern mining
 It is used as an analytical process that finds
frequent patterns or associations from data sets.
For example, grocery store transaction data
might have a frequent pattern that people
usually buy chips and beer together.

2
Apriori Algorithm
 The Apriori Algorithm produces frequent
patterns by generating itemsets and discovering
the most frequent itemset over a threshold
“minimal support count”.
 It greatly reduces the size of the itemset in the
database by one simple principle:
 If an itemset is frequent, then all of its
subsets must also be frequent.

3
Apriori Algorithm
 Apriori Algorithm is an influential algorithm for
mining frequent itemsets for boolean association
rules.
 It uses prior knowledge of frequent itemset
properties (Aprior).
 It uses K frequent itemsets to find K+1
itemsets.
 It is based on three concept: Frequent itemset,
Apriori property and join operations
4
Apriori Algorithm
 Advantages :
 Easy to understand and implement.
 Can be easily parallelized.

 Uses large itemset property.

 Disadvantages:
 Requires many database scans.
 Assumes transaction database is memory resident.

5
 Apriori Algorithm has a major shortfall. Using
Apriori requires multiple scans of the database
to check the support count of each item and
itemsets.
 When the database is huge, this will cost a
significant amount of disk I/O and computing
power.
 Therefore the FP-Growth algorithm is created
to overcome this shortfall. It only scans the
database twice and used a tree structure(FP-
tree) to store all the information.
6
FP Tree
 The root represents null,
 Each node represents an
item,
 association of the nodes in
the itemsets - the order
maintained while forming
the tree.
 The FP-tree is concise and is
used to directly generate
large itemsets.
 Once an FP-tree has been
constructed, it uses a
recursive divide-and-
conquer approach to mine
the frequent itemsets. 7
FP-Tree

8
Introduction
 Terminology
 Apriori-like Algorithms
 Generate-and-Test
 Cost Bottleneck

 FP-Tree and FP-Growth Algorithm

 FP-Tree: Frequent Pattern Tree
 FP-Growth: Mining frequent patterns with FP-Tree

11
Terminology
 Item set
 A set of items: I = {a1, a2, ……, am}
 Transaction database
 DB = <T1, T2, ……, Tn>
 Pattern
 A set of items: A
 Support
 The number of transactions containing A in DB
 Frequent pattern
 A’s support ≥ minimum support threshold ξ
 Frequent Pattern Mining Problem
 The problem of finding the complete set of frequent patterns
12
FP-Tree and FP-Growth Algorithm
 FP-Tree: Frequent Pattern Tree
 Compact presentation of the DB without information loss.
 Easy to traverse, can quickly find out patterns associated with
a certain item.
 Well-ordered by item frequency.
 FP-Growth Algorithm
 Start mining from length-1 patterns
 Recursively do the following
 Constructsits conditional FP-tree
 Concatenate patterns from conditional FP-tree with suffix

 Divide-and-Conquer mining technique

14
Outline
 Introduction
 Constructing FP-Tree
 Example 1
 Mining Frequent Patterns using FP-Tree
 Example 2
 Performance Evaluation
 Discussions

15
FP-Tree Definition
 Three components:
 One root: labeled as “null”
root
 A set of item prefix subtrees
f :4 c:1
 A frequent-item header table
c:3 b:1 b:1

Header Table a:3 p:1

item head of node-links
f m:2 b:1
c
a
b p:2 m:1
m
p
16
FP-Tree Definition (cont.)
 Each node in the item prefix subtree consists of
three fields:
 item-name
 node-link

 count

 Each entry in the frequent-item header table

consists of two fields:
 item-name
 head of node-link

17
Example 1: FP-Tree Construction
 The transaction database used (fist two column only):
TID Items Bought
100 f,a,c,d,g,i,m,p
200 a,b,c,f,l,m,o
300 b,f,h,j,o
400 b,c,k,s,p
500 a,f,c,e,l,p,m,n

minimum support threshold ξ= 3

18
Example 1 (cont.)
 First Scan: //count and sort
 count the frequencies of each item
 collect length-1 frequent items, then sort them in
support descending order into L, frequent item list.
L = {(f:4), (c:4), (a:3), (b:3), (m:3), (p:3)}
TID Items Bought (Ordered) Frequent Items
100 f,a,c,d,g,i,m,p f,c,a,m,p
200 a,b,c,f,l,m,o f,c,a,b,m
300 b,f,h,j,o f,b
400 b,c,k,s,p c,b,p
500 a,f,c,e,l,p,m,n f,c,a,m,p
19
Example 1 (cont.)
 Second Scan://create the tree and header table
 create the root, label it as “null”
 for each transaction Trans, do
 select and sort the frequent items in Trans
 increase nodes count or create new nodes

If prefix nodes already exist, increase their counts by 1;

If no prefix nodes, create it and set count to 1.
 build the item header table
 nodeswith the same item-name are linked in sequence
via node-links
20
Example 1 (cont.)
The building process of the tree

root root root root

f :1 f :2 f :3

c:1 c:2 c:2 b:1

a:1 a:2 a:2

m:1 m:1 b:1 m:1 b:1

p:1 p:1 m:1 p:1 m:1

Create root After trans 1 After trans 2 After trans 3
(f,c,a,m,p) (f,c,a,b,m) (f,b)
21
Example 1 (cont.)
The building process of the tree (cont.)

root root

f :3 c:1 f :4 c:1

c:2 b:1 b:1 c:3 b:1 b:1

a:2 p:1 a:3 p:1

m:1 b:1 m:2 b:1

p:1 m:1 p:2 m:1

After trans 4 After trans 5

(c,b,p) (f,c,a,m,p)
22
Example 1 (cont.)
Build the item header table
root

f :4 c:1

c:3 b:1 b:1

Header Table a:3 p:1

item head of node-links
f m:2 b:1
c
a
b p:2 m:1
m
p

23
FP-Tree Properties
 Completeness
 Each transaction that contains frequent pattern is
mapped to a path.
 Prefix sharing does not cause path ambiguity, as
only path starts from root represents a transaction.
 Compactness
 Number of nodes bounded by overall occurrence of
frequent items.
 Height of tree bounded by maximal number of
frequent items in any transaction.

24
FP-Tree Properties (cont.)
 Traversal Friendly (for mining task)
 For any frequent item ai, all the possible frequent
patterns that contain ai can be obtained by
following ai’s node-links.
 This property is important for divide-and-conquer.
It assures the soundness and completeness of
problem reduction.

25
29
30
Outline
 Introduction
 Constructing FP-Tree
 Example 1
 Mining Frequent Patterns using FP-Tree
 Example 2
 Performance Evaluation
 Discussions

31
FP-Growth Algorithm
 Functionality:
 Mining frequent patterns using FP-Tree generated before
 Input:
 FP-tree constructed earlier
 minimum support threshold ξ

 Output:
 The complete set of frequent patterns
 Main algorithm:
 Call FP-growth(FP-tree, null)
32
FP-growth(Tree, α)
Procedure FP-growth(Tree, α)
{
if (Tree contains only a single path P)
{ for each combination β of the nodes in P
{ generate pattern β Uα;
support = min(sup of all nodes in β)
}
}
else // Tree contains more than one path
{ for each ai in the header of Tree
{ generate pattern β= ai Uα;
β.support = ai.support;
construct β’s conditional pattern base;
construct β’s conditional FP-tree Treeβ;
if (Treeβ ≠ Φ)
FP-growth(Treeβ , β);
}
}

} 33
Example 2
 Start from the bottom of the header table: node p
 Two paths transformed prefix path
 p’s conditional pattern base
 {(f:2, c:2, a:2, m:2), (c:1, b:1)} root

 p’s conditional FP-tree f :4 c:1

 Only one branch (c:3)
c:3 b:1 b:1
pattern (cp:3)
 Patterns: Header Table a:3 p:1
item head of node-links
 (p:3)
 (cp:3) f m:2 b:1
c
a
b p:2 m:1
m
p 34
Example 2 (cont.)
 Continue with node m
 Two paths
 m’s conditional pattern base
 {(f:2, c:2, a:2), (f:1,c:1, a:1, b:1)} root

 m’s conditional FP-tree: f :4 c:1

 (f:3, c:3, a:3)
c:3 b:1 b:1
 Call mine(<f:3, c:3, a:3>| m)
 Patterns: Header Table a:3 p:1
item head of node-links
 (m:3)
f m:2 b:1
 see next slide c
a
b p:2 m:1
m
p 35
mine(<(f:3, c:3, a:3>| m)
root
 node a:
 (am:3)
Header Table
 call mine(<f:3, c:3>|am)
 (cam:3) item head of node-links f :3
 call(<f:3)|cam) f
 (fcam:3)
c
 (fam:3)
a c:3
 node c:
 (cm:3)
 call mine(<f:3>|cm)
 (fcm:3) a:3
 node f: conditional FP-tree of “m”
 (fm:3)
 All the patterns: (m:3, am:3, cm:3, fm:3, cam:3, fam:3, fcm:3, fcam:3)
 Conclusion: A single path FP-Tree can be mined by outputting all the
combination of the items in the path.

36
Example 2 (cont.)
 Continue with node b
 Three paths
root
 b’s conditional pattern base
f :4 c:1
 {(f:1, c:1, a:1), (f:1), (c:1)}
 b’s conditional FP-tree c:3 b:1 b:1

 Φ Header Table a:3 p:1

item head of node-links
 Patterns: f m:2 b:1
c
 (b:3) a
b p:2 m:1
m
p 37
Example 2 (cont.)
 Continue with node a
 One path
 a’s conditional pattern base root
 {(f:3, c:3)}
f :4 c:1
 a’s conditional FP-tree
 {(f:3, c:3)} c:3 b:1 b:1
 Patterns:
Header Table a:3 p:1
 (a:3)
item head of node-links
 (ca:3)
f m:2 b:1
 (fa:3)
c
 (fca:3) a
b p:2 m:1
m
p 38
Example 2 (cont.)
 Continue with node c
 Two paths
 c’s conditional pattern base root

 {(f:3)} f :4 c:1
 c’s conditional FP-tree c:3 b:1 b:1
 {(f:3)}
Header Table a:3 p:1
 Patterns: item head of node-links
 (c:4) f m:2 b:1
c
 (fc:3) a
b p:2 m:1
m
p 39
Example 2 (cont.)
 Continue with node f
 One path
root
 f’s conditional pattern base
f :4 c:1
 Φ
 f’s conditional FP-tree c:3 b:1 b:1

 Φ Header Table a:3 p:1

item head of node-links
 Patterns: f m:2 b:1
c
 (f:4) a
b p:2 m:1
m
p 40
Example 2 (cont.)
 Final results:
item conditional pattern base conditional FP-tree
p {(f:2, c:2, a:2, m:2), (c:1, b:1)} {(c:3)}| p
m {(f:4, c:3, a:3, m:2), {(f:3, c:3, a:3)}| m
(f:4, c:3, a:3, b:1, m:1)}
b {(f:4, c:3, a:3, b:1), (f:4, b:1), Φ
(c:1, b:1)}
a {(f;3, c:3)} {(f:3, c:3}| a
c {(f:3)} {(f:3)}| c
f Φ Φ

41
FP-Growth Properties
 Property 3.2 : Prefix path property
 To calculate the frequent patterns for a node ai in a
path P, only the prefix subpath of node ai in P need
to be accumulated, and the frequency count of
every node in the prefix path should carry the same
count as node ai.
 Lemma 3.1 : Fragment growth
 Letαbe an itemset in DB, B beα's conditional
pattern base, and βbe an itemset in B. Then the
support ofαUβin DB is equivalent to the support of
βin B.
42
FP-Growth Properties (cont.)
 Corollary 3.1 (Pattern growth)
 Let α be a frequent itemset in DB, B be α's conditional
pattern base, and β be an itemset in B. Then αUβ is frequent
in DB if and only if is β frequent in B.
 Lemma 3.2 (Single FP-tree path pattern generation)
 Suppose an FP-tree T has a single path P. The complete set
of the frequent patterns of T can be generated by the
enumeration of all the combinations of the subpaths of P
with the support being the minimum support of the items
contained in the subpath.

43
Outline
 Introduction
 Constructing FP-Tree
 Example 1
 Mining Frequent Patterns using FP-Tree
 Example 2
 Performance Evaluation
 Discussions

44
Performance Evaluation:
FP-Tree vs. Apriori
 Scalability with Support Threshold

45
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
 Per-item runtime actually decreases with support
threshold decrease.

46
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
 Scalability with DB size.

What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Design and Analysis of Column Structure For Jib Crane
No ratings yet
Design and Analysis of Column Structure For Jib Crane
6 pages
Micro Carrier Cell Culture Principles and Methods
No ratings yet
Micro Carrier Cell Culture Principles and Methods
140 pages
fp-tree
No ratings yet
fp-tree
37 pages
Data Wirehose and Mining 3
No ratings yet
Data Wirehose and Mining 3
15 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
U3 - FP Trees - 5th Sem - DS
No ratings yet
U3 - FP Trees - 5th Sem - DS
9 pages
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
No ratings yet
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
6 pages
FP Growth
No ratings yet
FP Growth
21 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
FP Growth Presentation v1 (Handout)
No ratings yet
FP Growth Presentation v1 (Handout)
10 pages
FP-Growth Algorithm (1)
No ratings yet
FP-Growth Algorithm (1)
5 pages
Shihab Rahman Dolon Chanpa Department of Computer Science and Engineering, University of Dhaka
No ratings yet
Shihab Rahman Dolon Chanpa Department of Computer Science and Engineering, University of Dhaka
23 pages
03 Pre Processing
No ratings yet
03 Pre Processing
20 pages
18-FP-Growth algorithm-12-02-2025
No ratings yet
18-FP-Growth algorithm-12-02-2025
24 pages
fp-growth
No ratings yet
fp-growth
16 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
F P-Tree F P-Growth
No ratings yet
F P-Tree F P-Growth
7 pages
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
No ratings yet
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
2 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
16 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
23 pages
Unit4 2 Association Rules FP Growth
No ratings yet
Unit4 2 Association Rules FP Growth
33 pages
FP-Tree Growth Algorithm
No ratings yet
FP-Tree Growth Algorithm
15 pages
Q) FP Growth Algorithm?: This Algorithm Works As Follows
No ratings yet
Q) FP Growth Algorithm?: This Algorithm Works As Follows
3 pages
FP Tree
No ratings yet
FP Tree
54 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
fpgrowth
No ratings yet
fpgrowth
11 pages
4.1) FP Growth Algorithm
No ratings yet
4.1) FP Growth Algorithm
26 pages
FPTree-09
No ratings yet
FPTree-09
45 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
23 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
No ratings yet
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
13 pages
Lecture 5 - FP-Growth Algorithm
No ratings yet
Lecture 5 - FP-Growth Algorithm
26 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
FP GROWTH ALG
No ratings yet
FP GROWTH ALG
17 pages
DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
Lecture_13_14_FP
No ratings yet
Lecture_13_14_FP
41 pages
Association Rule Mining: FP Growth
No ratings yet
Association Rule Mining: FP Growth
22 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
Association Rule Mining3
No ratings yet
Association Rule Mining3
13 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
Data Mining Unit 2 (Part 2)-1
No ratings yet
Data Mining Unit 2 (Part 2)-1
7 pages
How To Find Frequent Patterns?: Wim Pijls Walter A. Kosters
No ratings yet
How To Find Frequent Patterns?: Wim Pijls Walter A. Kosters
8 pages
A New Fast Algorithm For Constructing FP - Tree: Zhenzhou Wang Jiaomin Liu Sheng Guo Lijuan Yang
No ratings yet
A New Fast Algorithm For Constructing FP - Tree: Zhenzhou Wang Jiaomin Liu Sheng Guo Lijuan Yang
4 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
An Improved Frequent Pattern Tree the Child Struct
No ratings yet
An Improved Frequent Pattern Tree the Child Struct
19 pages
FPgrowth
No ratings yet
FPgrowth
2 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01
31 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
FP Growth Algorithm Example Problems
No ratings yet
FP Growth Algorithm Example Problems
12 pages
Machine Learning Based FP Growth Algorithm
No ratings yet
Machine Learning Based FP Growth Algorithm
8 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mastering Python
From Everand
Mastering Python
Rick van Hattem
No ratings yet
EIL Document On Motor, Panel
100% (1)
EIL Document On Motor, Panel
62 pages
Successfully Tested Types of Banknote Handling Machine - Customer-Operated Machines
No ratings yet
Successfully Tested Types of Banknote Handling Machine - Customer-Operated Machines
35 pages
LOGIC 003 / HUM 002: Prof. Clara Teresa Gene E. Portes
No ratings yet
LOGIC 003 / HUM 002: Prof. Clara Teresa Gene E. Portes
43 pages
Half Wave and Full Wave Rectification
No ratings yet
Half Wave and Full Wave Rectification
3 pages
Beee (G2ua120b) Lab Manual
No ratings yet
Beee (G2ua120b) Lab Manual
51 pages
Science: Pure Substances Vs Mixtures
No ratings yet
Science: Pure Substances Vs Mixtures
17 pages
Elementary Grammar Worksheets
100% (3)
Elementary Grammar Worksheets
37 pages
Physical Science
93% (15)
Physical Science
468 pages
Pengaruh Return On Invesment ROI Earning Per Share
No ratings yet
Pengaruh Return On Invesment ROI Earning Per Share
8 pages
PPS Reexam Synoptic answer 24-25
No ratings yet
PPS Reexam Synoptic answer 24-25
18 pages
Power Semiconductor Drives Course File
No ratings yet
Power Semiconductor Drives Course File
163 pages
Dav Public School
No ratings yet
Dav Public School
27 pages
XS-4222 4100XPC Assembly Procedure
100% (4)
XS-4222 4100XPC Assembly Procedure
8 pages
Manjunath B.S., Salembier P., Sikora T. - Introduction To MPEG 7. Multimedia Content Description Language
No ratings yet
Manjunath B.S., Salembier P., Sikora T. - Introduction To MPEG 7. Multimedia Content Description Language
400 pages
About Stephen - Stephen Hawking
No ratings yet
About Stephen - Stephen Hawking
2 pages
Module 6 - Differential Equations 1 (Self Study)
0% (1)
Module 6 - Differential Equations 1 (Self Study)
3 pages
StolPer Samuelson Theorem Note
No ratings yet
StolPer Samuelson Theorem Note
10 pages
Rodin: A Generative Model For Sculpting 3D Digital Avatars Using Diffusion
No ratings yet
Rodin: A Generative Model For Sculpting 3D Digital Avatars Using Diffusion
19 pages
Solution-F2024_MATH110_FinalExam (2)
No ratings yet
Solution-F2024_MATH110_FinalExam (2)
24 pages
Detailed Tibco Ems Presentation
No ratings yet
Detailed Tibco Ems Presentation
28 pages
Compound Interest Lesson Plan
No ratings yet
Compound Interest Lesson Plan
5 pages
Ncert Maths Module
100% (1)
Ncert Maths Module
171 pages
CN Assignment 1 COE-540
No ratings yet
CN Assignment 1 COE-540
5 pages
At 1314 Sample Paper IX - Going To X - IQ+S&M - AIEEE
No ratings yet
At 1314 Sample Paper IX - Going To X - IQ+S&M - AIEEE
15 pages
On The Integer Solutions of The Homogeneous Biquadratic Diophantine Equation
No ratings yet
On The Integer Solutions of The Homogeneous Biquadratic Diophantine Equation
8 pages
Code Repository Tool (Version Management) : Git Process
100% (1)
Code Repository Tool (Version Management) : Git Process
22 pages
Refrigeration Oils
100% (1)
Refrigeration Oils
48 pages
Stress Analysis Design
No ratings yet
Stress Analysis Design
5 pages