0% found this document useful (0 votes)

8 views12 pages

Pincer Search Algorithm

The Pincer Search Algorithm, proposed by Dao-I Lin and Zvi M. Kedem in 1997, combines top-down and bottom-up approaches for discovering maximum frequent itemsets in association rule mining, improving upon the original Apriori algorithm. It utilizes the Maximum Frequent Candidate Set (MFCS) to efficiently prune candidate itemsets and reduce computation time. The algorithm iteratively generates itemsets, counts their support, and updates the MFCS until all maximal frequent itemsets are identified.

Uploaded by

Lakshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views12 pages

Pincer Search Algorithm

Uploaded by

Lakshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Pincer-Search Algorithm for Discovering Maximum Frequent Set

Introduction

Pincer Search Algorithm was proposed by, Dao-I Lin & Zvi M. Kedem of

New York University in 1997. This algorithm uses both, the top-down and

bottom-up approaches to Association Rule mining. It is a slight modification

to Original Apriori Algorithm by R. Aggarwal & Srikant. In this the main

search direction is bottom-up (same as Apriori) except that it conducts

simultaneously a restricted top-down search, which basically is used to

maintain another data structure called Maximum Frequent Candidate Set

(MFCS). As output it produces the Maximum Frequent Set i.e. the set

containing all maximal frequent itemsets, which therefore specifies

immediately all frequent itemsets. The algorithm specializes in dealing with

maximal frequent itemsets of large length. The authors got their inspiration

from the notion of version space in Mitchell’s machine learning paper.

Concepts used:

Let I = (i1, i2, , im) be a set of m distinct items.

Transaction: A transaction T is defined as any subset of items in I.

Database: A database D is a set of transactions.

Itemset: A set of items is called an Itemset.

Length of Itemset: is the number of items in the itemset. Itemsets of length

‘k’ are referred to as k-itemsets.

Support Count: It is the total frequency of appearance of a particular

pattern in a database.

In other terms: If T is a transaction in database D and X is an itemset then T

is said to support X, if X  T, hence support of X is the fraction of

transactions that support X.

Frequent Itemset: An itemset whose support count is greater than or equal

to the minimum support threshold specified by the user.

Infrequent Itemset: An itemset which is not frequent is infrequent.

Downward Closure Property (Basis for Top-down Search): states that “If

an itemset is frequent then all its subsets must be frequent.”

Upward Closure Property (Basis for Bottom-up Search): states that “If an

itemset is infrequent then all its supersets must be infrequent.”

Maximal Frequent Set: it is a set, which is frequent and so are all its proper

subsets. All its proper supersets are infrequent.

Maximum Frequent Set: is the set of all Maximal Frequent sets.

Association Rule: is the rule of the form R : X  Y, where X and Y are two

non-empty and non-intersecting itemsets. Support for rule R is defined as

support of X Y. Confidence is defined as Support of X Y / Support of X.

Interesting Association Rule: An association rule is said to be interesting if

its support and confidence measures are equal to or greater than minimum

support and confidence thresholds (specified by user) respectively.

Candidate Itemsets: It is a set of itemsets, which are to be tested to

determine whether they are frequent or infrequent.

Maximum Frequent Candidate Set (MFCS): is a minimum cardinality set

of itemsets such that the union of all the subsets of elements contains all

frequent itemsets but does not contain any infrequent itemset, i.e. it is a

minimum cardinality set satisfying the conditions:

FREQUENT  { 2X | X  MFCS}

INFREQUENT  { 2X | X  MFCS}

Where FREQUENT and INFREQUENT, stand respectively for all frequent

and infrequent items (classified as such as far).

Pincer-Search Method

Pincer Search combines the following two approaches:

Bottom-up: Generate subsets and go on to generating parent-set candidate

sets using frequent subsets.

Top-Down: Generating parent set and then generating subsets.

It also uses two special properties:

Downward Closure Property: If an itemset is frequent, then all its must be

frequent.

Upward Closure Property: If an itemset is infrequent, all its supersets must

be infrequent.

It uses the above two properties for pruning candidate sets, hence decreases

the computation time considerably.

It uses both approaches for pruning in following way:

If some maximal frequent itemset is found in the top down direction, then

this itemset can be used to eliminate (possibly many) candidates in the

bottom-up direction. The subsets of this frequent itemset will be frequent

and hence can be pruned (acc. to Downward closure property).

If an infrequent itemset is found in the bottom up direction, then this

infrequent itemset can be used to eliminate the candidates found in top-down

search found so far (acc. to Upward Closure Property).

This two-way approach makes use of both the properties and speeds up the

search for maximum frequent set.

The algorithm begins with generating 1-itemsets as Apriori algorithm but

uses top-down search to prune candidates produced in each pass. This is

done with the help of MFCS set.

Let MFS denote set of Maximal Frequent sets storing all maximally frequent

itemsets found during the execution. So at anytime during the execution

MFCS is a superset of MFS. Algorithm terminates when MFCS equals

MFS.

In each pass over database, in addition to counting support counts of

candidates in bottom-up direction, the algorithm also counts supports of

itemsets in MFCS: this set is adapted for top-down search.

Consider now some pass k, during which itemsets of size k are to be

classified. If some itemset that is an element of MFCS, say X of cardinality

greater than k is found to be frequent in this pass, then all its subsets will be

frequent. Then all its subsets of cardinality k are pruned from the set of

candidates considered in bottom-up approach in this pass. They and their

supersets will never be candidates again. But, they are not forgotten and

used in the end.

Similarly when a new itemsets is found to infrequent in bottom-up

direction, the algorithm makes use of it to update MFCS, so that no subset of

any itemset in MFCS should have this infrequent itemset as its subset.

By use of MFCS maximal frequent sets can be found early in

execution and hence improve performance drastically.

Note: In general, unlike the search in bottom-up direction, which goes up

one level in one pass, the top down search can go down many levels in one

pass.

Now let’s see algorithmic representation of Pincer search so that we can see

how above concepts are applied:

Pincer Search Method

L0 =  ; k=1; C1 = {{i} | i  I }; S0 = 

MFCS = {{ 1,2,, n}}; MFS =  ;

do until Ck =  and Sk-1 = 

read the database and count support for Ck & MFCS.

MFS = MFS  {frequent itemsets in MFCS};

Sk = {infrequent itemsets in Ck };
call MFCS_gen algorithm if Sk  ;

call MFS_pruning procedure;

generate candidates Ck+1 from Ck ;

if any frequent itemset in Ck is removed from MFS_pruning procedure

call recovery procedure to recover candidates to Ck+1 .

call MFCS_prune procedure to prune candidates in Ck+1 .

k=k+1;

return MFS

MFCS_gen

for all itemsets s  Sk

for all itemsets m  MFCS

if s is a subset of m

MFCS = MFCS \ {m}

for all items e  itemset s

if m \ {e} is not a subset of any itemset in MFCS

MFCS = MFCS  {m \ {e}}

return MFCS

Recovery
for all itemsets l  Lk

for all itemsets m  MFS

if the first k-1 items in l are also in m

for i from j+1 to |m|

Ck+1 = Ck+1  {{l.item1, , l.itemk, m.itemi}}

MFS_prune

for all itemsets c in Lk

if c is a subset of any itemset in the current MFS

delete c from Lk.

MFCS_prune

for all itemsets in Ck+1

if c is not a subset of any itemset in the current MFCS

delete c from Ck+1;

MFCS is initialized to contain one itemset, which contains all of the

database items. MFCS is updated whenever new infrequent itemsets are

found. If an itemset is found to be frequent then its subsets will not

participate in subsequent support counting and candidate generation steps. If

some itemsets in Lk are removed, the algorithm will call the recovery

procedure to recover missing candidates.

Let’s apply this algorithm to the following example:

Example :1 - Customer Basket Database

Transaction Products

1 Burger, Coke, Juice

2 Juice, Potato Chips

3 Coke, Burger

4 Juice, Groundnuts

5 Coke, Groundnuts

Step 1: L0 =  , k=1;

C1 = {{ Burger}, {Coke}, {Juice}, {Potato Chips}, {Ground Nuts}}

MFCS = {Burger, Coke, Juice, Potato Chips, Ground Nuts}

MFS =  ;

Pass 1: Database is read to count support as follows:

{Burger}2, {Coke}3, {Juice} 3, {Potato Chips}1, {Ground

Nuts}2

{Burger, Coke, Juice, Potato Chips, Ground Nuts}  0

So MFCS = {Burger, Coke, Juice, Potato Chips, Ground Nuts}

& MFS = ;

L1 = {{Burger}, {Coke}, {Juice}, {Potato Chips}, {Ground Nuts}}

S1 = 

At the moment since S1 =  we don’t need to update MFCS

C2 = {{Burger, Coke}, {Burger, Juice}, {Burger, Potato Chips}, {Burger,

Ground Nuts}, {Coke, Juice}, {Coke, Potato Chips}, {Coke, Ground Nuts},

{Juice, Potato Chips}, {Juice, Ground Nuts}, {Potato Chips, Ground Nuts}}

Pass 2: Read database to count support of elements in C 2 & MFCS as given

below:

{Burger, Coke}  2, {Burger, Juice}1,

{Burger, Potato Chips}0, {Burger, Ground Nuts}0,

{Coke, Juice}1, {Coke, Potato Chips}0,

{Coke, Ground Nuts}1, {Juice, Potato Chips}1,

{Juice, Ground Nuts}1, {Potato Chips, Ground Nuts}0

{Burger, Coke, Juice, Potato Chips, Ground Nuts}0

MFS = 
L2 = {{Burger, Coke}, {Burger, Juice}, {Coke, Juice}, {Coke, Ground

Nuts}, {Juice, Ground Nuts}, {Juice, Potato Chips}}

S2 = {{Burger, Potato Chips}, {Burger, Ground Nuts}, {Coke, Potato

Chips}, {Potato Chips, Ground Nuts}}

For {Burger, Potato Chips} in S2 and for {Burger, Coke, Juice, Potato Chips,

Ground Nuts} in MFCS, we get new elements in MFCS as

{Burger, Coke , Juice, Ground Nuts} and {Coke, Juice, Potato Chips,

Ground Nuts}

For {Burger, Ground Nuts} in S2 & for {Coke, Juice, Potato Chips, Ground

Nuts} in MFCS, since {Burger, Ground Nuts} is not contained in this

element of MFCS hence no action.

For {Burger, Coke, Juice, Ground Nuts} in MFCS we get two new elements

{Burger, Coke, Juice} & {Coke, Juice, Ground Nuts}

Since {Coke, Juice, Ground Nuts} is already contained in MFCS, it is

excluded from MFCS

Now, MFCS = {{Burger, Coke, Juice}, {Coke, Juice, Potato Chips, Ground

Nuts}}

Now, for {Coke, Potato Chips} in S2 , we get

MFCS = {{Burger, Coke, Juice}, {Coke, Juice, Ground Nuts}, {Potato

Chips, Juice, Ground Nuts}}

Now, for {Potato Chips, Ground Nuts} in S2, we get

MFCS = {{Burger, Coke, Juice}, {Coke, Juice, Ground Nuts}, {Potato

Chips, Juice}}

Now, we generate candidate sets as

C3 = {{Burger, Coke, Juice}, {Potato Chips}}

Here algorithm stops since no more candidates need to be tested.

Then we do one scan for calculating actual support counts of all maximally

frequent itemsets and their subsets. Then we go on to generate rules using

minimum confidence threshold to get all interesting rules.

Dark Venus User Manual
No ratings yet
Dark Venus User Manual
11 pages
Atlas of CBCT
No ratings yet
Atlas of CBCT
76 pages
Pincer Search Algo
No ratings yet
Pincer Search Algo
8 pages
Module 4
No ratings yet
Module 4
71 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
3mining With Multiple Minimum Supports
No ratings yet
3mining With Multiple Minimum Supports
35 pages
Module 4 Full
No ratings yet
Module 4 Full
37 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
A Comprehensive Method For Discovering The Maximal Frequent Set
No ratings yet
A Comprehensive Method For Discovering The Maximal Frequent Set
9 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Week 3
No ratings yet
Week 3
56 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
Unit 3
No ratings yet
Unit 3
62 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Incremental Mining On Association Rules: Toshi Chandraker, Neelabh Sao
No ratings yet
Incremental Mining On Association Rules: Toshi Chandraker, Neelabh Sao
3 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Association Rules
No ratings yet
Association Rules
24 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
Unit 4
No ratings yet
Unit 4
113 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Unit2 Apriori Theory N Numerial
No ratings yet
Unit2 Apriori Theory N Numerial
5 pages
Mod 5
No ratings yet
Mod 5
56 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Unit-4 DM
No ratings yet
Unit-4 DM
7 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
DM 2
No ratings yet
DM 2
71 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Apriori
No ratings yet
Apriori
33 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Data Mining Module 4 Important Topics PYQs
No ratings yet
Data Mining Module 4 Important Topics PYQs
31 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
unit 2a
No ratings yet
unit 2a
59 pages
Chapter - 6 Data Mining
No ratings yet
Chapter - 6 Data Mining
65 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Computer Science
No ratings yet
Computer Science
59 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
3 - Unit-Iii-3
No ratings yet
3 - Unit-Iii-3
29 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
Ariori DHP
No ratings yet
Ariori DHP
53 pages
State Space Search: Fundamentals and Applications
From Everand
State Space Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lubricant and Coolant Recommendations: B 00525 - 7 en 0507 / SVB 1
No ratings yet
Lubricant and Coolant Recommendations: B 00525 - 7 en 0507 / SVB 1
30 pages
Branding Part
No ratings yet
Branding Part
6 pages
12 Maths Test Paper Ch2 1
0% (1)
12 Maths Test Paper Ch2 1
5 pages
HCSA-Digital Power
No ratings yet
HCSA-Digital Power
19 pages
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
No ratings yet
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
6 pages
Fendt 7,8 & 900 Series
No ratings yet
Fendt 7,8 & 900 Series
11 pages
Spoof Text
No ratings yet
Spoof Text
7 pages
BF Mast Presentation
100% (2)
BF Mast Presentation
24 pages
Alan121 Manual
No ratings yet
Alan121 Manual
8 pages
Title Iv Powers of Corporations
No ratings yet
Title Iv Powers of Corporations
9 pages
PEMF - Google Search
No ratings yet
PEMF - Google Search
1 page
Talkin' Loud - Incognito - Cifra Club
No ratings yet
Talkin' Loud - Incognito - Cifra Club
4 pages
Examen Ya
No ratings yet
Examen Ya
19 pages
Yingde Aokly Power Co., LTD: 6GFM200G (12V/200Ah)
No ratings yet
Yingde Aokly Power Co., LTD: 6GFM200G (12V/200Ah)
2 pages
Problem and Solution On Water Leaking: K Build Co., LTD
No ratings yet
Problem and Solution On Water Leaking: K Build Co., LTD
2 pages
Flora and Vegetation of Andhra Pradesh: January 2008
No ratings yet
Flora and Vegetation of Andhra Pradesh: January 2008
21 pages
Eco Park
100% (1)
Eco Park
3 pages
Short Circuit Current Ratings: White Paper
No ratings yet
Short Circuit Current Ratings: White Paper
17 pages
Biology Exploration IA
No ratings yet
Biology Exploration IA
4 pages
Character Customization Documentation
No ratings yet
Character Customization Documentation
5 pages
46 - The Thermal Efficiency Improvement of A Steam Rankine Cycle by Innovative Design of A Hybrid Cooling Tower and A Solar Chimney Concept
No ratings yet
46 - The Thermal Efficiency Improvement of A Steam Rankine Cycle by Innovative Design of A Hybrid Cooling Tower and A Solar Chimney Concept
1 page
Macharia Linda - Factors Influencing Human Resource Planning Among Insurance Firms in Kenya
No ratings yet
Macharia Linda - Factors Influencing Human Resource Planning Among Insurance Firms in Kenya
53 pages
Simple Stress
No ratings yet
Simple Stress
4 pages
EM 1110-1-4000 - Monitoring Well Design, Installation, and Documentation at Hazardous Toxic, and Radioactive Waste Sites - Web
100% (1)
EM 1110-1-4000 - Monitoring Well Design, Installation, and Documentation at Hazardous Toxic, and Radioactive Waste Sites - Web
69 pages
BHE - 2022 - Corporate Doctor
No ratings yet
BHE - 2022 - Corporate Doctor
9 pages
Boycott List 2023 PDF
No ratings yet
Boycott List 2023 PDF
32 pages
1954 Astro Letters
100% (1)
1954 Astro Letters
8 pages
5038 Asm 1
No ratings yet
5038 Asm 1
18 pages