0% found this document useful (0 votes)
8 views

Frequent Pattern Growth Algorithm

Frequent Pattern Growth Algorithm

Uploaded by

sravyasri2806
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Frequent Pattern Growth Algorithm

Frequent Pattern Growth Algorithm

Uploaded by

sravyasri2806
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Association:

Association rule mining is a technique used to uncover hidden relationships between


variables in large datasets. It is a popular method in data mining and machine learning
and has a wide range of applications in various fields, such as market basket analysis,
customer segmentation, and fraud detection.
The idea behind association rule mining is to determine rules, that allow us to identify
which objects may be related to a set of objects we already know. In the association
rule mining terminology, we refer to the objects as items. A common example for
association rule mining is basket analysis.
For example, if 75% of people who buy cereal also buy milk, then there is a discernible
pattern in transactional data that customers who buy cereal often buy milk. An
association rule is that there is an association between buying cereal and milk.
The types of Association Rule Mining are Single-dimensional, Multidimensional,
Quantitative, and Boolean association rules.
Classification rule mining aims to discover a small set of rules in the database that
forms an accurate classifier. Association rule mining finds all the rules existing in the
database that satisfy some minimum support and minimum confidence constraints.
Frequent Pattern Growth Algorithm:
The two primary drawbacks of the Apriori Algorithm are:
1. At each step, candidate sets have to be built.
2. To build the candidate sets, the algorithm has to repeatedly scan the database.

These two properties inevitably make the algorithm slower. To overcome these
redundant steps, a new association-rule mining algorithm was developed named
Frequent Pattern Growth Algorithm. It overcomes the disadvantages of the Apriori
algorithm by storing all the transactions in a Trie Data Structure. Consider the following
data:-
Transaction IDItemsT1{E,K,M,N,O,Y}T2{D,E,K,N,O,Y}T3{A,E,K,M}T4{C,K,M,U,Y}T5{C,E,I,K,O,
O}Transaction IDT1T2T3T4T5Items{E,K,M,N,O,Y}{D,E,K,N,O,Y}{A,E,K,M}{C,K,M,U,Y}
{C,E,I,K,O,O}
The above-given data is a hypothetical dataset of transactions with each letter
representing an item. The frequency of each individual item is computed:-
Item Frequency
A 1
C 2
D 1
E 4
I 1
K 5
M 3
N 2
O 4
U 1
Y 3
Item A C D E I K M N O U Y
Frequency 1 2 1 4 1 5 3 2 4 1 3

Let the minimum support be 3. A Frequent Pattern set is built which will
contain all the elements whose frequency is greater than or equal to the minimum
support. These elements are stored in descending order of their respective frequencies.
After insertion of the relevant items, the set L looks like this:-

L = {K : 5, E : 4, M : 3, O : 4, Y : 3}

Now, for each transaction, the respective Ordered-Item set is built. It is done by iterating
the Frequent Pattern set and checking if the current item is contained in the transaction
in question. If the current item is contained, the item is inserted in the Ordered-Item set
for the current transaction. The following table is built for all the transactions:
Transaction IDItemsOrdered-Item SetT1{E,K,M,N,O,Y}{K,E,M,O,Y}T2{D,E,K,N,O,Y}
{K,E,O,Y}T3{A,E,K,M}{K,E,M}T4{C,K,M,U,Y}{K,M,Y}T5{C,E,I,K,O,O}{K,E,O}Transaction I
DT1T2T3T4T5Items{E,K,M,N,O,Y}{D,E,K,N,O,Y}{A,E,K,M}{C,K,M,U,Y}{C,E,I,K,O,O}Ordered-
Item Set{K,E,M,O,Y}{K,E,O,Y}{K,E,M}{K,M,Y}{K,E,O}
Now, all the Ordered-Item sets are inserted into a Trie Data Structure.

a) Inserting the set {K, E, M, O, Y}:

Here, all the items are simply linked one after the other in the order of occurrence in the
set and initialize the support count for each item as 1.

b) Inserting the set {K, E, O, Y}:

Till the insertion of the elements K and E, simply the support count is increased by 1. On
inserting O we can see that there is no direct link between E and O, therefore a new
node for the item O is initialized with the support count as 1 and item E is linked to this
new node. On inserting Y, we first initialize a new node for the item Y with support count
as 1 and link the new node of O with the new node of Y.
c) Inserting the set {K, E, M}:

Here simply the support count of each element is increased by 1.

d) Inserting the set {K, M, Y}:

Similar to step b), first the support count of K is increased, then new nodes for M and Y
are initialized and linked accordingly.
e) Inserting the set {K, E, O}:

Here simply the support counts of the respective elements are increased. Note that the
support count of the new node of item O is increased.

Now, for each item, the Conditional Pattern Base is computed which is path labels of all
the paths which lead to any node of the given item in the frequent-pattern tree. Note
that the items in the below table are arranged in the ascending order of their
frequencies.
Now for each item, the Conditional Frequent Pattern Tree is built. It is done by taking
the set of elements that is common in all the paths in the Conditional Pattern Base of
that item and calculating its support count by summing the support counts of all the
paths in the Conditional Pattern Base.

From the Conditional Frequent Pattern tree, the Frequent Pattern rules are generated by
pairing the items of the Conditional Frequent Pattern Tree set to the corresponding to
the item as given in the below table.

For each row, two types of association rules can be inferred for example for the first row
which contains the element, the rules K -> Y and Y -> K can be inferred. To determine the
valid rule, the confidence of both the rules is calculated and the one with confidence
greater than or equal to the minimum confidence value is retained.

You might also like