MINING FREQUENT ITEMSETS USING ADVANCED PARTITION
APPROACH
Nay Chi Lynn, Khin Myat Myat Moe
University of Computer Studies, Magway
naychelynn@gmail.com
promotion, etc. Market basket analysis studies the
buying habits of customers by finding frequent
itemsets between the different items that customers
purchase [3].
Abstract
Frequent itemsets mining plays an important
part in many data mining tasks. This technique has
been used in numerous practical applications,
including market basket analysis. This paper
presents mining frequent itemsets in large database
of medical sales transaction by using the advanced
partition approach. This advanced partition
approach executes in two phases. In phase 1, the
advanced partition approach logically divides the
database into a number of non-overlapping
partitions. These partitions are considered one at a
time and all local frequent itemsets for those
partitions are generated using the apriori method.
In phase 2, the advanced partition approach finds
the final set of frequent itemsets. The purpose of
this paper is to extract the final sets of frequent
itemsets from medical retail datasets and to
support efficient information used to plan
marketing or advertising strategies for medical
stores and companies. Algorithms for finding
frequent itemsets like Apriori, needs many
database scans. But, this advanced partition
approach needs to scan the entire database only
one time. So, it reduces the time taken for the
large database scan in mining frequent itemsets.
Mining frequent itemsets is important and
interesting to the fundamental research in the
mining of association rules. Frequent itemsets
mining, one technique of the descriptive mining, is
widely used for market basket analysis. It analyses
customer buying habits by finding frequent itemsets
between different items that occur frequently
together in a given set of data. There are many
algorithms for finding frequent itemsets. The
Apriori is the basic first algorithm for finding
frequent itemsets. In this system, the advanced
partition algorithm is used for finding frequent
itemsets with an entire single data pass. By the
result of system testing, the user can know which
algorithm is suitable for his medical company and
store.
2. Related work
The works related to this system are presented
in here. Frequent itemsets mining is well explored
for various data types. The Apriori algorithm also
called level-wise algorithm was proposed by
Agrawal and Srikanth in 1994. The name of the
algorithm is based on the fact that the algorithm
uses the prior knowledge of frequent itemsets
properties [1]. Nguyen and Orlowska show the data
partition approach to further improve the
performance of frequent itemsets computation. The
methods focus on potential reduction of the size of
the input data required for deployment of the
partitioning based algorithms [4]. Kranthi and
Malreddy have proposed the advanced partition
1. Introduction
Nowadays, retailing is becoming a highperformance sport. Like athletes, retailers are
becoming competitive, seeking technology to gain,
and trying to have more knowledge into customer
buying behavior. Market basket analysis has
emerged as a step of the retail merchandising,
1
partition for calculating the frequent itemsets
separately the local minimum support is set to 1.
At the end of phase I, the advanced partition
approach merges all local frequent itemsets of each
partition to generate the global candidate item sets
(Ck G ).
Phase II: This phase just prune the item sets
from the global candidate itemsets list whose
combined support (s(c) Tc) (total support of an item
set in all the partitions) is less than the global
minimum support. So here the advanced partition
approach reads the entire database once during the
Phase I. And also, partition sizes are chosen such
that each partition can be accommodated in the
main memory [2].
approach to generate the frequent itemsets in a
single pass over the database [2]. In this paper the
advanced partition approach has been used to find
the frequent itemsets.
3. Theory background
Frequent itemsets mining is an important data
mining task. It extracts interesting correlations,
frequent itemsets among sets of items in the
transaction databases.
3.1 Frequent Itemsets
A set of items is referred to as an itemset. An
item set that contains k items is a k- itemset .The set
{computer, finical-management software} is a 2itemset. The occurrence frequency of an itemset is
the number of transactions that contain the itemset.
An itemset satisfies minimum support if the
occurrence frequency of the itemset is greater than
or equal to the product of min-sup and the total
number of transactions in the database. The number
of transactions required for the itemset to satisfy
minimum support is therefore referred to as the
minimum support count. If an itemset satisfies
minimum support, then it is a frequent itemset [1].
Table 1 Notations used for advanced partition
approach
Notation
Li
CkG
Lik
LG
3.2 Advanced Partition Approach
s(c)Tc
The advanced partition approach is based on the
premise that number of items in a transaction is
quite less compared to total number of items in the
transaction database. This advanced partition
approach is efficient when the local support of each
frequent item set in a partition is much higher than
1.
The advanced partition approach executes in
two phases. In phase 1, the advanced partition
approach divides the database into a number of
non-overlapping partitions.
These partitions are considered one at a time
and all frequent itemsets for that partition (Li) are
generated using the Apriori algorithm (presented in
section 3.2.1). In addition, when taking each
Meaning
Local Frequent Sets:
Set of Local Frequent Itemsets of
partition i.
Global Frequent Sets:
Set of global candidate k-itemsets.
Local Frequent Sets:
Set of local frequent k-Itemsets in
partition i.
Global Frequent Sets:
Set of global frequent Itemsets.
Combined Support
Total support of candidate set c in all
partitions.
Below is the algorithm of advanced partition
approach:
P = partition_database (T); N = Number of
partitions;
// Phase I
for i= 1 to n do
begin
read_in_partition ( Ti in P )
Li = generate all frequent itemsets of Ti using
Apriori algorithm in main memory.
end
// Merge Phase
for (k = 2; L i k ≠ ф, i=1,2,…,n; i++) do
begin
2
CkG = Ү ni=1 L ik
end
// Phase II
LG = ф ;
for each c є C G do
begin
if s(c) TC ≥ σ
LG = LG U {s(c)}
end
Answer = LG
5.
Ct = subset (Ck, t); // get the subsets of t
that are candidates
6. for each candidate c є Ct
7. c.count++;
8. }
9. Lk = { c є Ck\c.count ≥ min_sup}
10. }
11. return L = UkLk;
Procedure apriori_gen (Lk-1: frequent (k – 1)itemsets; min_sup: minimum support threshold)
1. for each itemset l1є Lk-1
2. for each itemset l2 є Lk-1
3. if(l1[1] = l2[1]) ^(l1[2] = l2[2]) ^ ... ^ (l1[k –
2] = l2[k – 2]) ^ (l1[k – 1] < l2[k – 1]) then
{
4. c = l1 ∞ l2; // join step: generate candidates
5. if has_infrequent_subset(c, Lk-1) then
6. delete c; // prune step: remove unfruitful
candidate
7. else add c to Ck;
8. }
9. return Ck;
Figure 1: The Advanced partition approach for
discovering frequent itemsets.
3.2.1 Apriori Algorithm
Apriori is an influential algorithm for mining
frequent itemsets for Boolean association rules.
Apriori uses prior knowledge of frequent item set
properties. Apriori employs an iterative approach
known as level-wise search, where k itemsets are
used to explore (k+1)-item sets [1].
There are two-steps in Apriori Algorithm:
1. The join step: To find Lk, a set of
candidate k- itemsets is generated by
joining Lk-1 with itself. This set of
candidates is denoted Ck. Let l1 and l2 be
itemsets in Lk-1.
2. The prune step: Ck is a superset of Lk,
that is, its members may or may not be
frequent, but all of the frequent k-itemsets
are included in Ck. A scan of the database
to determine the count of each candidate
in Ck would result in the determination of
Lk.
Algorithm: Apriori. Find frequent itemsets using
an interactive level-wise approach based on
candidate generation.
Input: Database, D, of transactions; minimum
support threshold, min_sup.
Output: L, frequent itemsets in D.
Method:
1.
2.
3.
4.
Procedure has_infrequent_subset (c: candidate kitemset; Lk-1: frequent (k – 1)-itemsets);
// use prior knowledge
1. for each (k – 1)-subset s of c
2. if s ¢ Lk-1 then
3. return TRUE;
return FALSE;[1]
Figure 2: The Apriori algorithm
4. System design and implementation
This paper finds frequent itemsets between
medical products purchased together by
customers. For the purpose of implementing, the
retail datasets supplied by a medical Co, Ltd is
used.
Using advanced partition approach, the system
is efficient in computing frequent itemsets and can
reduce the time spent in performing the I/O
operations for large databases.
L1= find_frequent_1-itemsets(D);
for (k = 2; Lk-1 ≠ ф; k++) {
Ck = apriori_gen (Lk-1, min_sup);
for each transaction t є D { // scan D for
counts
3
4.1
4.2 Implementation of the advanced partition
approach
System Design
If the user wants to entry new medical products
into his data set, the Entry new items process
should be chosen. If the user wants to buy those
above medical products, the Sales process should
be chosen. Moreover, the user can also import the
items from other datasets using import data menu.
The system will find the global frequent itemsets by
using the advanced partition approach. The user
has to choose the global minimum support count
either number or percentage.
Below is the example which shows the working
of the system. The transaction dataset (T) consists
of a set of transactions in the form (tid, itemspurchased) where tid is transaction ID. This
system includes medical sales items such as,
Lensan Para, Para BPI, Amoxy, I Amox, Flumox,
Brumox, Biogesic, etc.
The advanced partition approach executes in
two phases. In this form, medical sales itemsets
are partitioned according to user specified input
partition number (N). If the user input partition
number (N) is 2, then the advanced partition
approach (phase 1) divides sales itemsets into two
partitions such as partition 1 and partition 2 as
shown in figure 4.
begin
Choose process
menu
Entry new items
Sales process
Calculate frequent
itemsets using
advanced partition
approach
Import data
Transactio
n data
Choose (global)
frequent itemsets
s(c)>=(global)
min-sup
no
Prune those
itemsets
yes
Generate global
frequent itemsets
Display global
frequent
itemsets
end
Figure 4: Partition dataset form
Figure 3: System flow diagram
Then, the advanced partition approach (phase 1)
finds all local frequent itemsets of each partition
using Apriori method in figure 5.
4
addressed in mining frequent itemsets for
discovering the set of large items. This system is
intended to implement frequent itemsets mining.
The discovery of frequent patterns and
correlation relationships among huge amounts of
data is useful in selective marketing, decision
analysis, and business management. This system
can help retailers, buyers, planners, merchandisers,
and store managers to plan more profitable
advertising and promotions, attract more
customers, and increase the value of the market
basket.
Moreover, one can use the results to plan
marketing or advertising strategies, or in the design
of a new catalog. For instance, it may help
managers to design different store layouts. In one
strategy, items that are frequently purchased
together can be placed together in close proximity
in order to further encourage the sales of such items
together. This system can act as a consultant for
medical stores by giving the information of
frequent items.
Figure 5: Local frequent itemsets form
The advanced partition approach (phase 2) finds
global frequent itemsets which satisfy userspecified minimum support (if user specified
minimum support count number is 10, it will show
the itemsets above 10) in figure 6 and then gives
final set of (global) frequent itemsets.
References
[1] J.W.Han, M. Kamber, “Data Mining Concepts and
Technique”, ISBN 1-55860-489-8, Morgan Kaufmann
Publishers.
[2] Kranthi K. Malreddy, B.S, “Mining Frequent
Itemsets Using Advanced Partition Approach”, Dean of
the Graduate School, December, 2004.
[3] Lary Gordan, Partner, “Leading Practices In
Market Basket Analysis”, the Face Point Group, 349
First Street, Los Altos, CA 94022 (650)559-2105,
Gordan@Facepoint.Com
[4] Son N. Nguyen, Maria E. Orlowska, “A Further
Study In The Data Partitioning Approach For Frequent
Itemsets Mining”, School Of Information Technology
And Electrical Engineering, the University Of
Queensland, QLD 4072, Australia {Nnson, Maria}
Itee.Uq.Edu.Au.
Figure 6: Final set of frequent itemsets form
5. Conclusions
Mining frequent itemsets is important and is
one of the primary sub-areas on the fields of data
mining. Market basket data analysis has been well
5