Clustering and Association Rule

Cluster Analysis?
Definition
• Finding groups of objects such that the objects in a
group will be similar (or related) to one another and
different from (or unrelated to) the objects in other
groups
• Cluster: a collection of data objects

– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
General Applications of Clustering
• Pattern Recognition
• Spatial Data Analysis
• Image Processing
• Economic Science (especially market research)
- discover distinct groups of customer
• WWW
– Document classification
– Cluster Weblog data to discover groups of similar access
patterns
What Is Good Clustering?
• A good clustering method will produce high quality clusters
with
– high intra-class similarity
– low inter-class similarity
• The quality of a clustering result depends on both the
similarity measure used by the method and its
implementation
• The quality of a clustering method is also measured by its
ability to discover some or all of the hidden patterns
Clustering Algorithm possess following
• Scalability
• Ability to deal with different types of attributes
• Discovery of clusters with arbitrary shape
• Minimal requirements for domain knowledge to
determine input parameters
• Able to deal with noise and outliers
• Insensitive to order of input records
• High dimensionality
• Incorporation of user-specified constraints
• Interpretability and usability
Type of data in clustering analysis
• Interval-scaled variables :For interval attributes, the
differences between values are meaningful, i.e., a unit of
measurement exists. (+, -)
– Examples: calendar dates, temperatures in Celsius or Fahrenheit
• Nominal: The values of a nominal attribute are just different
names, i.e., nominal attributes provide only enough
information to distinguish one object from another. (=, ≠)
– Examples: ID numbers, eye color, zip codes
• Ordinal: The values of an ordinal attribute provide enough

information to order objects (< >)
– Examples: rankings (e.g., taste of potato chips on a scale from 1-
10), grades, height in {tall, medium, short}
Type of data in clustering analysis cont..
• Ratio variables:
• For ratio variables, both differences and ratios
are meaningful. (*, /)
– Examples: temperature in Kelvin, length, time,
counts
• Binary variables:
• Variables of mixed types:
Properties of Attribute Values
• The type of an attribute depends on which of the

following properties it possesses:
- Distinctness: = ≠
– Order: < >
– Addition: + -
– Multiplication: */
– Nominal attribute: distinctness
– Ordinal attribute: distinctness & order
– Interval attribute: distinctness, order & addition
– Ratio attribute: all 4 properties
Clustering Algorithm operate on
DataStructure
 x11 ... x1f ... x1p 
Data matrix:  
Data set can be represented  ... ... ... ... ... 
x ... xif ... xip 
by an  i1 
 ... ... ... ... ... 
n-by-p matrix (n objects p x ... xnf ... xnp 
 n1 
variables or attributes):
 0 
 d(2,1) 0 
Dissimilarity matrix:  
 d(3,1) d ( 3,2) 0 
n-by n table  
 : : : 
d ( n,1) d ( n,2) ... ... 0
Dissimilarity Matrix
• Dissimilarity/Similarity metric: Similarity is expressed in
terms of a distance function, which is typically metric:
d(i, j)
• where d(i, j) is the measured difference or dissimilarity
between objects i and j.
• In general, d(i, j) is a nonnegative number that is close to 0
when objects i and j are
• highly similar or “near” each other, and becomes larger the

more they differ. Since
• d(i, j)=d( j, i), and d(i, i)=0
Similarity and Dissimilarity Between
Objects
•Distances are normally used to measure the similarity or
dissimilarity between two data objects
•Some popular ones include: Minkowski distance:
d (i, j)  q (| x  x | q  | x  x | q ... | x  x | q )
i1 j1 i2 j2 ip jp
where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-
dimensional data objects, and q is a positive integer
•If q = 1, d is Manhattan distance

d (i, j) | x  x |  | x  x | ... | x  x |
i1 j1 i2 j 2 ip jp
Similarity and Dissimilarity Between
Objects (Cont.)
•If q = 2, d is Euclidean distance:
d (i, j)  (| x  x | 2  | x  x | 2 ... | x  x | 2 )
i1 j1 i2 j2 ip jp
- Properties
• d(I , j)  0
• d(I ,I ) = 0
• d(I ,j ) = d(j ,i) Symmetric
• d( I , j)  d(I ,k ) + d(k ,j ) triangular inequality
Also, one can use weighted distance, parametric Pearson product

moment correlation, or other dissimilarity measures
Dissimilarity for interval-valued variables
• Standardize data: Convert measurements in units less variables

- Calculate the mean absolute deviation:
s f  1n (| x1 f  m f |  | x2 f  m f | ... | xnf  m f |)
where m f  1n (x1 f  x2 f  ...  xnf )

.
- Calculate the standardized measurement (z-

score) xif  m f
z  if sf
• Using mean absolute deviation is more robust than using
standard deviation
• Find dissimilarity using Euclidean distance
Binary Variables
•A contingency table for binary data
Object j
1 0 sum
1 a b a b
Object i
0 c d cd
sum ac bd p
•Simple matching coefficient (invariant, if the binary variable is

symmetric): d (i, j)  bc
a bc  d
Jaccard coefficient (noninvariant if the binary variable is

asymmetric):
d (i, j )  bc
a bc
Dissimilarity between Binary Variables
Example
Example
Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4

Jack M Y N P N N N
Mary F Y N P N P N
Jim M Y P N N N N
- Gender is a symmetric attribute

-The remaining attributes are asymmetric binary suppose distance between
objects is based only on the asymmetric variables.
- let the values Y and P be set to 1, and the value N be set to 0
01
d ( jack , mary )   0.33
2 01
11
d ( jack , jim )   0.67
111
1 2
d ( jim , mary )   0.75
11 2
Dissimilarity between Nominal Variables
•A generalization of the binary variable in that it can take more

than 2 states, e.g., red, yellow, blue, green
•Method 1: Simple matching
m: # of matches, p: total # of variables
d (i, j)  p p m
•Method 2: use a large number of binary variables

- creating a new binary variable for each of the M nominal states
Dissimilarity between Ordinal Variables
•An ordinal variable can be discrete or continuous
•Order is important, e.g., rank
•Can be treated like interval-scaled
rif {1,..., M f }
- Replace xif by their rank
- map the range of each variable(having different ststes) onto [0, 1] by
replacing i-th object in the f-th variable by
rif 1
zif 
M f 1
• compute the dissimilarity using methods for interval-

scaled variables
Dissimilarity between Ratio-Scaled
Variables
• Ratio-scaled variable: A positive measurement on a nonlinear
scale, approximately at exponential scale, such as AeBt or Ae-Bt
• Methods:
– Treat them like interval-scaled variables—not a good choice! (why?—
the scale can be distorted)
– Apply logarithmic transformation
yif = log(xif)
Yif can be treated as interval-valued
– Treat them as continuous ordinal data treat their rank as interval-
scaled
Dissimilarity between Variables of Mixed
Types
• A database may contain all the six types of variables
- Symmetric binary, asymmetric binary, nominal, ordinal, interval and

ratio
• One may use a weighted formula to combine their effects
d (i, j) 
 pf  1ij( f )dij( f )
 pf  1ij( f )
ij( f )  0 if x if  x jf  0 or mis sin g
- f is binary or nominal: o.w. it is 1

dij(f) = 0 if xif = xjf , or dij(f) = 1 o.w.
- f is interval-based: use the normalized
distance
- f is ordinal or ratio-scaled
zif  r 1 if
• compute ranks rif and
• and treat zif as interval-scaled M 1 f
Clustering types
• Partitioning algorithms: Construct various partitions and then
evaluate them by some criterion
• Hierarchy algorithms: Create a hierarchical decomposition of
the set of data (or objects) using some criterion
• Density-based: based on connectivity and density functions
• Grid-based: based on a multiple-level granularity structure
• Model-based: A model is hypothesized for each of the
clusters and the idea is to find the best fit of that model to
each other
Partitioning Algorithms: Basic Concept
• Partitioning method: Construct a partition of a database D of n
objects into a set of k clusters(k ≤n)
• Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion
• Iterative relocation to improve the partitioning by moving objects
from one group to another
– Global optimal: exhaustively enumerate all partitions
– Heuristic methods: k-means and k-medoids algorithms
– k-means (MacQueen’67): Each cluster is represented by the center of the
cluster
– k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects located near the center of
the cluster
The K-Means Clustering Method
• Partition Based Clustering
• The k-means algorithm partitions the given data into k

clusters.
• Each cluster has a cluster center, called centroid
• k is specified by the user
• The quality of a clustering result depends on the algorithm,

the distance function, and the application.
K-means Clustering Algorithm
Algorithm 1 Algorithm k-means(k,D)
1: Randomly choose k data points (seeds) to be the initial centroids,

cluster
2: repeat
3: for Every data point x ɛ D do
4: compute the distance from x to each centroid
5: assign x to the closest centroid
6: end for
7: recompute the centroids using current grouping of data
points
8: until the stopping criterion is met
Stopping or Convergence Conditions
• no (or minimum) re-assignments of data points to different
clusters
• no (or minimum) change of centroids
• minimum decrease in the sum of squared error (SSE)

Strengths and Weaknesses
• simple, easy and most popular
• The algorithm is only applicable if the mean is
defined.
• For categorical data, k-mode-the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers.
Example: K-means I
• K=2, Euclidean distance.
Object Attribute1(X) Attribute2(Y) PH
Weight index index
A 1 1
B 2 1
C 4 3
D 5 4
Example: K-means II
• Let C1={1,1} and C2={2,1}
Distance Table Cluster A B C D
C1={1,1} Group -1 0 1 3.61 5
C2={2,1} Group-2 1 0 2.83 4.24

Example: K-means III
• Let C1={1,1} and C2={3.66,2.66}
Cluster A B C D
Distance Table C1={1,1} Group -1 0 1 3.61 5
C2={3.66,2.66} Group-2 3.14 2.36 0.47 1.89

Example: K-means IV
• Updated centroids C1={1.5,1} and C2={4.5,3.5}
Cluster A B C D
Distance Table C1={1.5,1} Group -1 0.5 0.5 3.20 4.61
C2={4.5,3.5} Group-2 4.30 3.54 0.71 0.71
No Change in Centroids.
Hierarchical Clustering
• Grouping data objects into a tree of clusters.
• Use distance matrix as clustering criteria.
• This method does not require the number of clusters k as an

input
• But needs a termination condition(Desired number of cluster)
• Agglomerative(bottom-up strategy)
• Divisive (top-down strategy)
Example
• In agglomerative merge nodes that have the least dissimilarity(minimum
Euclidean distance )
• In hierarchical divisive each cluster is represented by all of the objects
• Cluster is split (maximum Euclidean distance between the points)
• St • St • St • St • St • agglomerative
• Ae • e e e e • (AGNES)
A
• Jp p p •p A J Mp E R
• M1 J2 3• M 4E R 5
• E
• ER
• R
• divisive
• St • St • St • St • St • (DIANA)
Agglomerative Clustering
• It is more popular then divisive methods.
• At the beginning, each data point forms a cluster (also called a

node).
• Merge nodes/clusters that have the least distance.
• Go on merging
• Eventually all nodes belong to one cluster

Agglomerative Clustering Algorithm
Algorithm 2 Agglomerative(D)
1: Make each data point in the dataset D as cluster
2: Compute all pair distances of each data point
3: repeat
4: Find two clusters that are nearest to each other
5: Merge the two clusters to form a new cluster C
6: Compute the distance from C to all other clusters
7: until their is only one cluster left
Example: Agglomerative Clustering I
A B C D E A B C D E F
A A 0.00 0.71 5.66 3.61 4.24 3.20
B B 0.71 0.00 4.95 2.92 3.54 2.50
C
C 5.66 4.95 0.00 2.24 1.41 2.50
D
D 3.61 2.92 2.24 0.00 1.00 0.50
E
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
• Start with clusters of individual points and a proximity matrix

• Similarity between two clusters is measured by the similarity of
the closest pair of data points belonging to different clusters
Example: Agglomerative Clustering II
A B C D,F E
A 0.00 0.71 5.66 3.20 4.24
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 .00 0.00
•
•We want to merge the two closest clusters (D and F) and
update the matrix.
Example: Agglomerative Clustering III
A,B C D,F E
A,B 0.00 4.95 2.50 3.54
C 4.95 0.00 2.24 1.41
D,F 2.50 2.24 0.00 1.00
E 3.54 1.41 1.00 0.00
Example: Agglomerative Clustering IV
A,B C (D,F),E
A,B 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00
A,B ((D,F),E),C
A,B 0.00 2.50
((D,F),E),C 2.50 0.00

How to Define Inter-Cluster Similarity
• Key operation is the computation of the
proximity of two clusters
– Different approaches to defining the distance between
clusters distinguish the different algorithms
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function
– Ward’s Method uses squared error
Cluster Similarity: MIN or Single Link
• Similarity of two clusters is based on the two most similar
(closest) points in the different clusters
- Determined by one pair of points, i.e., by one link in the

proximity graph.
A,B C (D,F),E
A,B 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00

1 2 3 4 5
Cluster Similarity: MAX or Complete
Linkage
• Similarity of two clusters is based on the two least similar
(most distant) points in the different clusters
- Determined by all pairs of points in the two clusters
A,B C (D,F),E
A,B 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00

Cluster Similarity: Group Average
Proximity of two clusters is the average of pairwise proximity
between points in the two clusters
 Dist(C
C i Clusteri
i , Cj)
C j Cluster j
Dist(Clust eri , Clusterj ) 
|Clusteri ||Clusterj |
Strengths
Min
• Can handle non-elliptical shapes
Max
• Less susceptible to noise and outliers
Group Average
• Less susceptible to noise and outliers
Limitations
Minimum and maximum measures represent two extremes in

measuring distance between clusters
Min
• Sensitive to noise and outliers
MAX
• Tends to break large clusters
• Biased towards globular(Spherical) clusters
Group Average
• Biased towards globular clusters
Dendrogram
• Dendrogram is commonly used to represent the process of

hierarchical clustering
• Vertical axis to show the similarity scale between clusters

Measuring the Distance of two Clusters
• Single link: The distance between two clusters is the distance between two closest
data points in the two clusters, one data point from each cluster. It can handle
arbitrarily shaped clusters, but It may cause the undesirable "chain effect" by noisy
points.
• Complete link: The distance between two clusters is the distance of

two furthest data points in the two clusters. It is sensitive to outliers
because they are far away.
• Average link: In this method, the distance between two clusters is the
average distance of all pair-wise distances between the data points in
two clusters.
• Centroids: In this method, the distance between two clusters is the

distance between their centroids.
What Is Association Mining?
• Association rule mining
– Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
– Frequent pattern: pattern (set of items, sequence, etc.) that

occurs frequently in a database e.g { Bread, Milk}
– E.g. PC -> Pen drive-> External hard Disk
• Motivation: Finding regularities in data

– What products were often purchased together? Bread and
Milk?!
– What are the subsequent purchases after buying a PC?
– What kinds of DNA are sensitive to this new drug?
Definition: Frequent Itemset
Itemset: A collection of one or more items
Example: {Milk, Bread, Diaper} TID Items
k-itemset 1 Bread, Milk

An itemset that contains k items 2 Bread, Diaper, Sugar, Eggs
Support count () 3 Milk, Diaper, Sugar, Coke
Frequency of occurrence of an itemset 4 Bread, Milk, Diaper, Sugar
E.g. ({Milk, Bread, Diaper}) = 2 5 Bread, Milk, Diaper, Coke
Support
Fraction of transactions that contain an
itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater than
or equal to a minsup threshold
Definition: Association Rule
TID Items
1 Bread, Milk
• Association Rule 2 Bread, Diaper, Sugar, Eggs
– An implication expression of the form X  3 Milk, Diaper, Sugar, Coke
Y, where X and Y are itemsets 4 Bread, Milk, Diaper, Sugar
– Example:
5 Bread, Milk, Diaper, Coke
{Milk, Diaper}  {Sugar}
Example:
• Rule Evaluation Metrics
– Support (s) {Milk , Diaper}  Sugar
• Fraction of transactions that
contain both X and Y s   (Milk, Diaper, Sugar)  2  0.4
|T| 5
– Confidence (c)
• Measures how often items in Y
appear in transactions that c   (Milk, Diaper,Sugar)  2  0.67
 (Milk, Diaper) 3
contain X
Association Rule Mining
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items in
the transaction
Market-Basket transactions
TID Items Example of Association Rules

1 Bread, Milk
2 Bread, Diaper, Sugar, Eggs
{Diaper}  {Sugar},
3 Milk, Diaper, Sugar, Coke
{Milk, Bread}  {Eggs,Coke},
4 Bread, Milk, Diaper, Sugar {Sugar, Bread}  {Milk},
5 Bread, Milk, Diaper, Coke
Implication means co-occurrence,

not causality!
Basic Concepts:
Association Rules
Itemset X={x1, …, xk}
Transaction-id Items bought Find all the rules XY with min
confidence and support
9 R, A, M
• support, s, probability that a
10 R, M transaction contains XY
11 R, N • confidence, c, conditional
12 A, V, W probability that a transaction
having X also contains Y.
Let min_support = 50%, E.g. Let Frequent itemset be {M,R}
min_conf = 50%:
R  M (50%, 66.7%) • For each frequent itemset l,
M  R(50%, 100%) generate all nonempty subsets of l.
• For every nonempty subset of l,
output the rule subset (l-subset)
Mining Association Rules:
Example
Min. support 50%
Transaction-id Items bought Min. confidence 50%
9 R, A, M Frequent Support
Itemsets
10 R, M
{R} 75%
11 R, N
{A} 50%
12 A, V, W
{M} 50%
{R,M} 50%
For rule R  M:
support = support({R}{M}) = 50%
confidence = support({R}{M})/support({R}) = 66.6%
Mining Association Rules:
What We Need to Know
• Goal: Rules with high support/confidence

• How to compute?
– Support: Find sets of items that occur frequently
– Confidence: Find frequency of subsets of
supported itemsets
• If we have all frequently occurring sets of
items (frequent itemsets), we can compute
support and confidence!
Apriori: A Candidate Generation-and-Test
Approach
• Any subset of a frequent itemset must be frequent
– if {Sugar, diaper, nuts} is frequent, so is {Sugar, diaper}
– Every transaction having {Sugar, diaper, nuts} also contains {sugar,
diaper}
• Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!
• Method:
– Generate length (k+1) candidate itemsets from length k frequent
itemsets, and
– test the candidates against DB
• Performance studies show its efficiency and scalability
• Agrawal & Srikant 1994, Mannila, et al. 1994
The Apriori Algorithm—An Example
A database has five transactions. Let min sup = 60% or 3 and min conf = 80%.
TID Items_bought
T100 { M ,O, N, K ,E, Y }
T200 { D ,O, N ,K, E, Y }
T300 { M, A, K, E }
T400 { M, U, C, K, Y }
T500 { C, O, O, K, I, E }
Find all frequent itemsets using Apriori ?

The Apriori Algorithm—An Example
Ite Su Itemst C2
Items Sup 2nd scan
ms p {M,O} Items Su
et et
et p
1 scan {M} 3
st C2 {M,K}
L1 { M } 3
C1 {O} 3 {M,E} {M,O} 1
{O} 3
{N} 2 {M,Y} {M,K} 3
{K} 5
{K} 5 {O,K} {M,E} 2
{E} 4 L2
{E} 4 {O,E}
{Y } 3 {M,Y} 2
{Y} 3 {O,Y} Items Su {O,K} 3
{D} 1 {K,E} et p
{A} 1 {O,E} 3
{K,Y} {M,K} 3
{U} 1 3rd scan {E,Y} {O,K} 3 {O,Y} 2
{C} 2 {O,E} 3 {K,E} 4
Itemse Su Itemset
{I} 1 t p L3 C3 {k,E} 4 {K,Y} 3
{O,K,E} 3
{O,K,E} 3 {K,Y} 3 {E,Y} 2
{k,E,Y} 2
The Apriori Algorithm
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Important Details of Apriori
• How to generate candidates?
• Step 1: self-joining Lk (Apriori assume the items in Lk-1 are listed in an lexicographic order).
• The join, Lk-1 * Lk-1, is performed, where members of Lk-1 are joinable if their first (k-2)
items are in common.
– Step 2: pruning
• Example of Candidate-generation
– L3={abc, abd, acd, ace, bcd}
– Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
– Pruning:
• acde is removed because ade is not in L3
– C4={abcd}
Frequent Itemset Generation Strategies
• Reduce the number of candidates (M)
– Complete search: M=2d
– Use pruning techniques to reduce M
• Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms
• Reduce the number of comparisons (NM)

– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every
transaction
How to Count Supports of Candidates?
• Why counting supports of candidates a problem?
– The total number of candidates can be very huge
– One transaction may contain many candidates
• Method:
– Candidate itemsets are stored in a hash-tree
– Leaf node of hash-tree contains a list of itemsets and counts
– Interior node contains a hash table
– Subset function: finds all the candidates contained in a transaction
Challenges of Frequent Pattern Mining
• Challenges
– Multiple scans of transaction database
– Huge number of candidates
– Tedious workload of support counting for candidates
• Improving Apriori: general ideas
– Reduce passes of transaction database scans
– Shrink number of candidates
– Facilitate support counting of candidates
Bottleneck of Frequent-pattern Mining
• Multiple database scans are costly
• Mining long patterns needs many passes of scanning
and generates lots of candidates
– To find frequent itemset i1i2…i100
• # of scans: 100
• # of Candidates: (1001) + (1002) + … + (110000) = 2100-1 = 1.27*1030 !
• Bottleneck: candidate-generation-and-test
• Can we avoid candidate generation?
Mining Frequent Patterns Without
Candidate Generation
• Frequent pattern growth or FP-growth
• Grow long patterns from short ones using local

frequent items
• Divide and conquer
• database→ frequent pattern tree(Item set association

information) →conditional databases(associated with
one frequent item)→mine each separately
Construct FP-tree from a Transaction
Database
TID Items_bought (ordered) frequent items
T100 { M ,O, N, K ,E, Y } {K,E,M,O,Y}
T200 { D ,O, N ,K, E, Y }
struct FP-tree {K,E,O,Y} min_support = 3
T300 { M, A, K, E }
{K,E,M}
T400 { M, U, C, K, Y }
T500 { C, O, O, K, I, E } {K,M,Y}
{K,E,O}
1. Scan DB once, find frequent 1-itemset (single item

pattern)
2. Sort frequent items in frequency descending order, FP-
list
Construct FP-tree from a Transaction
Database Cont..
• First, create the root of Header Table
the tree, labeled with
“null” Item frequency head
K 5
E 4
• Scan DB again
M 3
•The items in each {}
O 3
transaction are processed Y 3
in FP order and a branch K:5
is created for each
transaction M:1
E:4
• each item points to its
M:2 Y:1
occurrences in the tree O:2
via a chain of node-links. O:1
Y:1
FP-list=K-E-M-O-Y Y:1
Construction of conditional pattern base
• Start from each frequent length-1 pattern (as an initial suffix
pattern)
• consists of the set of prefix paths in the FP-tree co-occurring
with the suffix pattern
Conditional pattern bases

item cond. pattern base
Y {{K,E,M,O:1}, {K,E,O:1},{K,M:1}}
O {{K,E,M:1}, {K,E:2}}
M {{K,E:2}, {K:1}}
E {{K:4}}
From Conditional Pattern-bases to
Conditional FP-trees
• For each pattern-base
– Accumulate the count for each item in the base
– Construct the FP-tree for the frequent items of the pattern base
Y-conditional pattern base:

O-conditional pattern base:
{{K,E,M,O:1}, {K,E,O:1},{K,M:1}}
{{K,E,M:1}, {K,E:2}}
{} {} All frequent
All frequent patterns patterns relate to Y
relate to O
Y
K:3 O K:3
{K,Y:3}
{K,O:3}
E:3 {E,O:3}
{K,E,O:3}
Y-conditional FP-tree
O-conditional FP-tree
From Conditional Pattern-bases to
Conditional FP-trees cont..
• The pattern growth is achieved by the concatenation of the suffix
pattern with the frequent patterns generated from a conditional
FP-tree
E-conditional pattern base:
M-conditional pattern base: {{K:4}}
{{K,E:2}, {K:1}}
All frequent All frequent
patterns patterns
{} relate to M relate to E
M {} E
{K,M:3} {K,E:4}
K:3 K:4
E-conditional FP-tree
M-conditional FP-tree
A Special Case: Single Prefix Path in FP-tree
• Suppose a (conditional) FP-tree T has a shared single prefix-path P

• Mining can be decomposed into two parts
– Reduction of the single prefix path into one node
– Concatenation of the mining results of the two parts
• {} r1
{}
• a1:n1
• a2:n2 a1:n1
b1:m1 C1:k1
• a3:n3  r1 = +
a2:n2
• b1:m1 • C1:k1 a3:n3

C2:k2 C3:k3
• C2:k2 • C3:k3
Mining Frequent Patterns With FP-trees
• Idea: Frequent pattern growth
– Recursively grow frequent patterns by pattern and
database partition
• Method
– For each frequent item, construct its conditional pattern-
base, and then its conditional FP-tree
– Repeat the process on each newly created conditional FP-
tree
– Until the resulting FP-tree is empty, or it contains only one
path—single path will generate all the combinations of its
sub-paths, each of which is a frequent pattern
Why Is FP-Growth the Winner?
• Divide-and-conquer:
– decompose both the mining task and DB according to the frequent
patterns obtained so far
– leads to focused search of smaller databases
• Other factors
– no candidate generation, no candidate test
– compressed database: FP-tree structure
– no repeated scan of entire database
– basic ops—counting local freq items and building sub FP-tree, no
pattern search and matching

Clustering and Association Rule

Uploaded by

Copyright:

Available Formats

Clustering and Association Rule

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clustering and Association Rule

Uploaded by

Copyright:

Available Formats

Cluster Analysis?

• Cluster: a collection of data objects

• Ordinal: The values of an ordinal attribute provide enough

• The type of an attribute depends on which of the

• highly similar or “near” each other, and becomes larger the

•If q = 1, d is Manhattan distance

Also, one can use weighted distance, parametric Pearson product

• Standardize data: Convert measurements in units less variables

where m f  1n (x1 f  x2 f  ...  xnf )

- Calculate the standardized measurement (z-

•Simple matching coefficient (invariant, if the binary variable is

Jaccard coefficient (noninvariant if the binary variable is

Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4

- Gender is a symmetric attribute

•A generalization of the binary variable in that it can take more

•Method 2: use a large number of binary variables

• compute the dissimilarity using methods for interval-

- Symmetric binary, asymmetric binary, nominal, ordinal, interval and

• One may use a weighted formula to combine their effects

- f is binary or nominal: o.w. it is 1

• The k-means algorithm partitions the given data into k

• Each cluster has a cluster center, called centroid

• k is specified by the user

• The quality of a clustering result depends on the algorithm,

1: Randomly choose k data points (seeds) to be the initial centroids,

• no (or minimum) change of centroids

• minimum decrease in the sum of squared error (SSE)

C1={1,1} Group -1 0 1 3.61 5

C2={2,1} Group-2 1 0 2.83 4.24

C2={3.66,2.66} Group-2 3.14 2.36 0.47 1.89

C2={4.5,3.5} Group-2 4.30 3.54 0.71 0.71

• Use distance matrix as clustering criteria.

• This method does not require the number of clusters k as an

• But needs a termination condition(Desired number of cluster)

• It is more popular then divisive methods.

• At the beginning, each data point forms a cluster (also called a

• Merge nodes/clusters that have the least distance.

• Eventually all nodes belong to one cluster

F 3.20 2.50 2.50 0.50 1.12 0.00

• Start with clusters of individual points and a proximity matrix

((D,F),E),C 2.50 0.00

- Determined by one pair of points, i.e., by one link in the

A,B 0.00 4.95 2.50

C 4.95 0.00 1.41

(D,F),E 2.50 1.41 0.00

- Determined by all pairs of points in the two clusters

A,B 0.00 4.95 2.50

C 4.95 0.00 1.41

(D,F),E 2.50 1.41 0.00

Minimum and maximum measures represent two extremes in

• Dendrogram is commonly used to represent the process of

• Vertical axis to show the similarity scale between clusters

• Complete link: The distance between two clusters is the distance of

• Centroids: In this method, the distance between two clusters is the

– Frequent pattern: pattern (set of items, sequence, etc.) that

• Motivation: Finding regularities in data

k-itemset 1 Bread, Milk

TID Items Example of Association Rules

Implication means co-occurrence,

• Goal: Rules with high support/confidence