Pattern Sequence Mining: Presented By: Devika Mittal

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 15

PATTERN SEQUENCE MINING

Presented By: DEVIKA MITTAL O915CS081019

CONTENTS
Some terminology association rule sequential pattern sequence Database support What is Sequential Pattern Mining? Challenges ? Algorithms Applications

association rule
the rule can be Buy A=)Buy B. mining does not take the time stamp into

account, NOTE: If we take time stamp into account then we can get more accurate and useful rules such as: Buy A implies Buy B within a week, or usually people Buy A every week. make more sound decisions.

Sequential pattern
It is a sequence of itemsets that frequently

occurred in a specific order, all items in the same itemsets are supposed to have the same transaction time value or within a time gap. transactions of a customer are together viewed as a sequence

Sequence Database
sequence database S is shown with min

support = 2 set of items in the database is {aa,b,c,,d,e,f,g} A sequence {a,(abc)(ac)d(cf)} Sequence Id Sequence has five elements. 10 {a,(abc)(ac)d(cf)} It is also a 9 sequence 20 {(ad)c,(bc)(ae)} since there are 9 instance 30 {(ef)(ab)(df)cb} in sequence
40 {eg(af)cbc}

Support
Support, a customer support a sequence s if

s is contained in the corresponding customersequence, the support of sequence s is dened as the fraction of customers who support this sequence. Support(s) = Number of support customers Total number of customers

What Is Sequential Pattern Mining?


Given a set of sequences, find the complete set of

frequent subsequences. sequential pattern mining is trying to find the relationships between occurrences of sequential events, to find if there exist any specific order of the occurrences. Sequential pattern mining is the process of extracting certain sequential patterns whose support exceed a predefined minimal support threshold.

Example..
From a book store's transaction database

history, we can find the frequent sequential purchasing patterns, for example 80% customers who brought the book Database Management typically bought the book Data Warehouse and then brought the book Web Information System with certain time gap.

Types:
string mining: used in biology, to examine gene and protein sequences primarily concerned with sequences with a single member at each position. Itemset mining:
used more often in marketing concerned with multiple-symbols at each position. popular approach to text mining.

Challenges on Sequential Pattern Mining


A huge number of possible sequential patterns are

hidden in databases
A mining algorithm should

find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold be highly efficient, scalable, involving only a small number of database scans be able to incorporate various kinds of userspecific constraints

Sequential Pattern Mining Algorithms


Apriori-based Approaches

GSP SPADE

sequential pattern mining methods follow the methodology of Apriori encounters problems when a sequence database is large

Pattern-Growth-based Approaches

FreeSpan PrefixSpan substantially reduces the size of projected databases and leads to efficient processing.

Applications
Applications of sequential pattern mining

Customer shopping sequences:

First buy computer, then CD-ROM, and then digital camera, within 3 months.

Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. Telephone calling patterns, Weblog click streams DNA sequences and gene structures

CONCLUSION:
Still more improvements are likely to be done. Balance and more clarity for results. More research is needed. In essence, the database need a way to store

more pages, combat data, and still provide (or attempt to provide) pertinent results.

THANK YOU

ANY QUERY..???

You might also like