0% found this document useful (0 votes)
16 views22 pages

PrefixSpan Final

The PrefixSpan algorithm is designed for efficient discovery of frequent sequential patterns in sequence databases using database projection and depth-first search. It is commonly applied in areas such as web clickstream analysis and customer behavior prediction, despite not being the most efficient algorithm. The algorithm is favored for its simplicity and ease of extension.

Uploaded by

vineetsuradkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views22 pages

PrefixSpan Final

The PrefixSpan algorithm is designed for efficient discovery of frequent sequential patterns in sequence databases using database projection and depth-first search. It is commonly applied in areas such as web clickstream analysis and customer behavior prediction, despite not being the most efficient algorithm. The algorithm is favored for its simplicity and ease of extension.

Uploaded by

vineetsuradkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

The PrefixSpan algorithm

• The PrefixSpan (Prefix-projected Sequential pattern


mining) algorithm is an efficient method for discovering
frequent sequential patterns in a sequence database.
• This algorithm uses a concept of database projection
and a depth-first search.
• It is widely used in applications like web clickstream
analysis, customer behavior prediction, and
bioinformatics.
• This is not the most efficient algorithm, but it is simple and
easy to extend, so it is popular.
Example
This is the input:

𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩
Sequence database

𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 ,
{𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 1
PrefixSpan eliminates infrequent items:

𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩
Sequence database
⟨ 𝑎 ⟩ support :
Result:
𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
3
⟨ 𝑏 ⟩ support :
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 ,
{𝑐}⟩
4
⟨ 𝑐 ⟩ support :
𝑚𝑖𝑛𝑠𝑢𝑝 =
3
4
⟨ 𝑑 ⟩ support :
1
Step 2 – Find patterns starting with ⟨ 𝑎

PrefixSpan does a database projection with ⟨ 𝑎
⟩: Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩ ⟨ 𝑎 ⟩𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩
Projected database of

𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 ,
{𝑐}⟩ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 3 – Find patterns starting with ⟨ 𝑎

PrefixSpan does a database projection with ⟨ 𝑎
⟩: Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩ ⟨ 𝑎 ⟩𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
Projected database of

𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 ,
{𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 3 – Find patterns starting with ⟨ 𝑎

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

⟨ 𝑎 , {𝑎}⟩ support :
Result:

𝑚𝑖𝑛𝑠𝑢𝑝 =
1

3 ⟨ 𝑎 , {𝑏}⟩ support :
𝑎,
𝑏
support :
1
⟨ 𝑎 , {𝑐}⟩ support:
3

3
Step 4 – Find patterns starting with ⟨ 𝑎 ,
{𝑐}⟩
PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐
⟩:
⟨ 𝑎 ⟩𝑆1 = Projected database of ⟨ 𝑎 ,
⟨ _𝑏 , 𝑐 , 𝑎 ⟩
Projected database of
{𝑐}⟩ 𝑆1 = ⟨ _𝑏 , 𝑐 ,
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩ 𝑎 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩ 𝑆2 = ⟨ _𝑏 , 𝑏 ,
𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 4 – Find patterns starting with ⟨ 𝑎 ,
{𝑐}⟩
PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐
⟩:
⟨ 𝑎 ⟩𝑆1 = Projected database of ⟨ 𝑎 ,
⟨ _𝑏 , 𝑐 , 𝑎 ⟩
Projected database of
{𝑐}⟩ 𝑆 = ⟨ 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩ 1

𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 4 – Find patterns starting with ⟨ 𝑎 ,
{𝑐}⟩
, 𝑐 ⟩ that has one more item:
Then, PrefixSpan counts the support of each sequential pattern

𝑎
starting with
Projected database of ⟨ 𝑎 ,
{𝑐}⟩ 𝑆 = ⟨ 𝑎 ⟩
1

⟨ 𝑎 , 𝑐 , {𝑎}⟩ support :
Result:

𝑚𝑖𝑛𝑠𝑢𝑝 =
1

3
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏

PrefixSpan does a database projection with ⟨{𝑎,
𝑏}⟩:
⟨ _𝑏 , of
𝑆1 = database 𝑐 ,⟨𝑎𝑎 ⟩⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
Projected

𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏

PrefixSpan does a database projection with ⟨{𝑎,
𝑏}⟩: database of
⟨ 𝑎 ⟩𝑆1 = Projected database of ⟨{𝑎,
⟨ _𝑏 , 𝑐 , 𝑎 ⟩
Projected
𝑏}⟩ S1 = ⟨ _ , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩ 𝑏
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏

PrefixSpan does a database projection with ⟨{𝑎,
𝑏}⟩: database of
⟨ 𝑎 ⟩𝑆1 = Projected database of ⟨{𝑎,
⟨ _𝑏 , 𝑐 , 𝑎 ⟩
Projected
𝑏}⟩ 𝑆1 = ⟨ 𝑐, 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ 𝑏 ,𝑐⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏

Then, PrefixSpan removes infrequent patterns:

Projected database of ⟨{𝑎,


𝑏}⟩ 𝑆1 = ⟨ 𝑐, 𝑎 ⟩
𝑆2 = ⟨ 𝑏 ,𝑐⟩
𝑆4 = ⟨ {𝑐}⟩

⟨ 𝑎, 𝑏 , {𝑐}⟩ support
Result:
𝑚𝑖𝑛𝑠𝑢𝑝 =
3
:3
⟨ 𝑎, 𝑏 , {𝑏}⟩
{𝑎}⟩ support
:1
𝑎, 𝑏 ,
{𝑐}
Step 6 – Find patterns starting with

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 ,


{𝑐}⟩:
Projected database of ⟨{𝑎, Projected database of ⟨ 𝑎, 𝑏 ,
𝑏}⟩ 𝑆1 = ⟨ 𝑐, 𝑎 ⟩ {𝑐}⟩ 𝑆 = ⟨ 𝑎 ⟩
𝑆2 = ⟨ 𝑏 ,𝑐⟩
1

𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
𝑎, 𝑏 ,
{𝑐}
Step 6 – Find patterns starting with

starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:


Then, PrefixSpan counts the support of each sequential pattern

Projected database of ⟨ 𝑎, 𝑏 ,
{𝑐}⟩ 𝑆 = ⟨ 𝑎 ⟩
1

⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩
Result:
𝑚𝑖𝑛𝑠𝑢𝑝 =
3
support : 1

This pattern is
infrequent!
{𝑏}
Step 7 – Find patterns starting with

PrefixSpan does a database projection for ⟨ 𝑏


⟩:

𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩ ⟨{𝑏}⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 ,
Sequence database Projected database of

𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩ 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 ,
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑐 ⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , ⟨ 𝑏 , 𝑐,
{𝑐}⟩ 𝑆3 =
{𝑑}⟩
𝑚𝑖𝑛𝑠𝑢𝑝 = 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 ,
3 {𝑐}⟩
{𝑏}
Step 7 – Find patterns starting with

PrefixSpan does a database projection for ⟨ 𝑏


⟩:

𝑆1 = ⟨ 𝑎, 𝑏 ,𝑐 , 𝑎 ⟩ ⟨{𝑏}⟩ 𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
Sequence database Projected database of

𝑆2 = ⟨ 𝑎, 𝑏 ,𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑏 , 𝑐⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , 𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩
{𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
{𝑏}
Step 7 – Find patterns starting with

Then, PrefixSpan eliminates infrequent patterns:

⟨{𝑏}⟩ 𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
Projected database of

𝑆2 = ⟨ 𝑏 , 𝑐⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = ⟨ 𝑏 , {𝑎}⟩ support :


Result:
3
⟨ 𝑏 , {𝑏}⟩ support :
2
2
⟨ 𝑏 , {𝑐}⟩ support :
3
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐
⟩:
Projected database of ⟨ 𝑏 ,
⟨{𝑏}⟩ 𝑆1 = ⟨ 𝑐 , 𝑎 ⟩ {𝑐}⟩ 𝑆1 =
Projected database of
⟨ 𝑐 , 𝑎
𝑆2 = ⟨ 𝑏 , 𝑐⟩ ⟩
𝑆2 = ⟨ 𝑏 , 𝑐
𝑆3 = ⟨ 𝑐 , {𝑑}⟩ ⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑐 ,
𝑆3 =
{𝑑}⟩
𝑚𝑖𝑛𝑠𝑢𝑝 = 𝑆4 = ⟨ 𝑎, 𝑏 ,
3 {𝑐}⟩

71
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐
⟩:
Projected database of ⟨ 𝑏 ,
⟨{𝑏}⟩ 𝑆1 = ⟨ 𝑐 , 𝑎 ⟩ {𝑐}⟩ 𝑆1 = ⟨ 𝑎 ⟩
Projected database of

𝑆2 = ⟨ 𝑏 , 𝑐⟩ 𝑆3 = ⟨{𝑑}⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 =
3
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐

starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Then, PrefixSpan counts the support of each sequential pattern

Projected database of ⟨ 𝑏 ,
{𝑐}⟩ 𝑆1 = ⟨ 𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

⟨ 𝑏 , 𝑐 , {𝑎}⟩ support :
Result:

1𝑏 , 𝑐 ,
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 {𝑑}
support :
1
Final result:
Those are the frequent sequential
patterns:
⟨ 𝑎 ⟩ support : 3
• ⟨ 𝑏 ⟩ support : 4

• ⟨ 𝑐 ⟩ support : 4
• ⟨ 𝑎 , {𝑐}⟩ support: 3
• 𝑎, 𝑏
• ⟨ 𝑎, 𝑏 , {𝑐}⟩
support : 3

support : 3
• ⟨ 𝑏 , {𝑐}⟩
support : 3

You might also like