New Association Rule
New Association Rule
New Association Rule
Association
Mining Association Rules
in Large Databases
Association rule mining
Mining single-dimensional Boolean association
rules from transactional databases
Mining multilevel association rules from
transactional databases
Mining multidimensional association rules from
transactional databases and data warehouse
From association mining to correlation analysis
Constraint-based association mining
Summary
What Is Association Mining?
Applications:
Basket data analysis, cross-marketing, catalog
design, loss-leader analysis, clustering,
classification, etc.
Example:
itself
Prune Step: Any (k-1)-itemset that is not
frequent cannot be a subset of a frequent k-
itemset
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
The Apriori Algorithm — Example
Example 2:
How to Generate Candidates?
Suppose the items in Lk-1 are listed in an
order
Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2,
p.itemk-1 < q.itemk-1
Step 2: pruning
forall itemsets c in Ck do
How to Count Supports of
Candidates?
Why counting supports of candidates a problem?
The total number of candidates can be very
huge
One transaction may contain many
candidates
Method:
Candidate itemsets are stored in a hash-tree
Leaf node of hash-tree contains a list of
itemsets and counts
Interior node contains a hash table
Subset function: finds all the candidates
Example of Generating
Candidates
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
Methods to Improve Apriori’s
Efficiency
Level 1 Milk
min_sup = 5%
[support = 10%]
Back
Reduced Support
Multi-level mining with reduced support
Level 1 Milk
min_sup = 5%
[support = 10%]
Back
Multi-level Association:
Redundancy Filtering
Other Interestingness Measures:
Interest
Interest (correlation, lift)P( A B)
P ( A) P ( B )
taking both P(A) and P(B) in consideration
P(A^B)=P(B)*P(A), if A and B are independent
events
A and B negatively correlated, if the value is less
than 1; otherwise A and B positively
Itemset Support correlated
Interest
X 1 1 1 1 0 0 0 0
X,Y 25% 2
Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57
Mining Association Rules in
Large Databases
Association rule mining
Mining single-dimensional Boolean association
rules from transactional databases
Mining multilevel association rules from
transactional databases
Mining multidimensional association rules from
transactional databases and data warehouse
From association mining to correlation analysis
Constraint-based association mining
Summary
Constraint-Based Mining
Interactive, exploratory mining giga-bytes of data?
Could it be real? — Making good use of constraints!
What kinds of constraints can be used in mining?
Knowledge type constraint: classification, association,
etc.
Data constraint: SQL-like queries
Find product pairs sold together in Vancouver in
Dec.’98.
Dimension/level constraints:
in relevance to region, price, brand, customer
category.
Rule constraints
small sales (price < $10) triggers big sales (sum >
$200).
Interestingness constraints:
strong rules (min_support 3%, min_confidence
Mining Association Rules in
Large Databases
Association rule mining
Mining single-dimensional Boolean association
rules from transactional databases
Mining multilevel association rules from
transactional databases
Mining multidimensional association rules from
transactional databases and data warehouse
From association mining to correlation analysis
Constraint-based association mining
Summary