Pattern Sequence Mining: Presented By: Devika Mittal
Pattern Sequence Mining: Presented By: Devika Mittal
Pattern Sequence Mining: Presented By: Devika Mittal
CONTENTS
Some terminology association rule sequential pattern sequence Database support What is Sequential Pattern Mining? Challenges ? Algorithms Applications
association rule
the rule can be Buy A=)Buy B. mining does not take the time stamp into
account, NOTE: If we take time stamp into account then we can get more accurate and useful rules such as: Buy A implies Buy B within a week, or usually people Buy A every week. make more sound decisions.
Sequential pattern
It is a sequence of itemsets that frequently
occurred in a specific order, all items in the same itemsets are supposed to have the same transaction time value or within a time gap. transactions of a customer are together viewed as a sequence
Sequence Database
sequence database S is shown with min
support = 2 set of items in the database is {aa,b,c,,d,e,f,g} A sequence {a,(abc)(ac)d(cf)} Sequence Id Sequence has five elements. 10 {a,(abc)(ac)d(cf)} It is also a 9 sequence 20 {(ad)c,(bc)(ae)} since there are 9 instance 30 {(ef)(ab)(df)cb} in sequence
40 {eg(af)cbc}
Support
Support, a customer support a sequence s if
s is contained in the corresponding customersequence, the support of sequence s is dened as the fraction of customers who support this sequence. Support(s) = Number of support customers Total number of customers
frequent subsequences. sequential pattern mining is trying to find the relationships between occurrences of sequential events, to find if there exist any specific order of the occurrences. Sequential pattern mining is the process of extracting certain sequential patterns whose support exceed a predefined minimal support threshold.
Example..
From a book store's transaction database
history, we can find the frequent sequential purchasing patterns, for example 80% customers who brought the book Database Management typically bought the book Data Warehouse and then brought the book Web Information System with certain time gap.
Types:
string mining: used in biology, to examine gene and protein sequences primarily concerned with sequences with a single member at each position. Itemset mining:
used more often in marketing concerned with multiple-symbols at each position. popular approach to text mining.
hidden in databases
A mining algorithm should
find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold be highly efficient, scalable, involving only a small number of database scans be able to incorporate various kinds of userspecific constraints
GSP SPADE
sequential pattern mining methods follow the methodology of Apriori encounters problems when a sequence database is large
Pattern-Growth-based Approaches
FreeSpan PrefixSpan substantially reduces the size of projected databases and leads to efficient processing.
Applications
Applications of sequential pattern mining
First buy computer, then CD-ROM, and then digital camera, within 3 months.
Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. Telephone calling patterns, Weblog click streams DNA sequences and gene structures
CONCLUSION:
Still more improvements are likely to be done. Balance and more clarity for results. More research is needed. In essence, the database need a way to store
more pages, combat data, and still provide (or attempt to provide) pertinent results.
THANK YOU
ANY QUERY..???