0% found this document useful (0 votes)

31 views65 pages

Week 4 - Lecture Slides - K-Means, Mixture Models, & EM

Uploaded by

fantiaoxi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views65 pages

Week 4 - Lecture Slides - K-Means, Mixture Models, & EM

Uploaded by

fantiaoxi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

CS6140: Machine Learning

Week 4 – Clustering, Mixture Models, & Expectation Maximization

Dr. Ryan Rad

Summer 2024
Today’s Agenda

• Group Quiz
• Unsupervised Learning
• Clustering
Clustering • K-Means
• Gaussian Mixture Model & EM
• Hierarchical Clustering

2
Performance Evaluation
in predictive modeling
Warm-up exercise

Scenario #1

• Classification with 2 classes: Score: 52%

Scenario #2

• Classification with 8 classes – Score: 24%

Which model is better?

Ji Chen
Jun Tan Liyao Zhang

Quizzes

Dana Zhan
Jun Tan Peinan Wang
Supervised vs
Unsupervised
Figure from: Goyal, Anil. (2018). Learning a Multiview Weighted
Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting.
https://miro.medium.com/max/4096/0*p3zNp9D8YFmX32am.jpg
Classical Machine Learning Methods
What is clustering?

Goal: Grouping items that “belong together” (i.e. have similar features)

Constraint: We only use the features X, not the labels Y (unsupervised)

Grouping unlabeled examples is called clustering

10
Why do we need clustering?

Lack of Labeled Data

• Data Labeling (annotation) is not only expensive and labor intensive but sometimes impossible.

Summarize and Understand the Data

• Clustering helps in understanding the natural grouping in a dataset. Their purpose is to make sense to
partition the data into some group of logical groupings.

Outlier Detection
• Clustering can help detecting anomalies and outliers in the data

Privacy Preservation
• Clustering can help preserve privacy by associating user data with cluster IDs instead of specific users.
Clustering Applications (Customer segmentation)

Who is your:
Most valued customer?
At the risk of churn?
Will be interested in a particular
promotion/product?
…………………
Unsupervised Learning (clustering)
Spend ($)

$10,000

Sell specific products to the

targeted group of audience

20 Number of Purchases
Unsupervised Learning (clustering)
Create clusters using World Economic Indicators data
What makes a GOOD clustering?

Which one demonstrates a “better” clustering?

What makes a good clustering?

• Organizing data into clusters such that there is:

• high intra-cluster similarity (better association)
• low inter-cluster similarity (more distinct clusters)
Silhouette Score

The Silhouette Score is a single metric that effectively combines both inter-cluster similarity (separation
between clusters) and intra-cluster similarity (compactness within clusters). It provides a comprehensive
evaluation of cluster quality.

How the Silhouette Score Works:

• For each data point, Calculate:
• The average distance to all other points within the same cluster
(intra-cluster distance, a)
• The average distance to all points in the nearest neighboring cluster
(inter-cluster distance, b)

Interpretation:
• High Silhouette Score (closer to 1): Indicates that the data point is well-matched to its own cluster
and poorly matched to neighboring clusters. This suggests good clustering quality.
• Low Silhouette Score (closer to -1): Suggests that the data point may have been assigned to the
wrong cluster, as it is more similar to points in a neighboring cluster.
• Silhouette Score around 0: Indicates that the data point is on or very close to the decision boundary
between two neighboring clusters.
Types of Clustering Methods

Centroid-based Clustering: Density-based Clustering

k-means is the most widely-used Density-based clustering connects areas
centroid-based clustering algorithm. of high example density into clusters.

Distribution-based Clustering Hierarchical Clustering

This clustering approach assumes data Hierarchical clustering creates a tree
is composed of distributions, such of clusters. Hierarchical clustering, not
as Gaussian distributions. surprisingly, is well suited to
hierarchical data, such as taxonomies.

Photos taken from: https://developers.google.com/machine-learning/clustering/clustering-algorithms

Slide credit: Andrew Ng
Slide credit: Andrew Ng
Slide credit: Andrew Ng
Slide credit: Andrew Ng
Slide credit: Andrew Ng
K-means algorithm
• Input:
• 𝐾 (number of clusters)
• Training set 𝑥 (") , 𝑥 ($) , 𝑥 (%) , ⋯ , 𝑥 (&)

• 𝑥 (') ∈ ℝ( (note: drop 𝑥) = 1 convention)

Slide credit: Andrew Ng

K-means algorithm
• Randomly initialize 𝐾 cluster centroids 𝜇!, 𝜇", ⋯ , 𝜇# ∈ ℝ$
Repeat {

for 𝑖 = 1 to 𝑚
𝑐 (&) ≔ index (from 1 to 𝐾) of cluster centroid
closest to 𝑥 (&)

for 𝑘 = 1 to 𝐾
𝜇( ≔ average (mean) of points assigned to cluster 𝑘

} Slide credit: Andrew Ng

K-means Objective Function
• 𝑐 (') = Index of cluster (1, 2, … K) to which
example 𝑥 ' is currently assigned
• 𝜇* = cluster centroid 𝑘 (𝜇* ∈ ℝ( )
• 𝜇+ (") = cluster centroid of cluster to which
example 𝑥 ' has been assigned
• Optimization objective:
&
1 $
𝐽 𝑐 "
, ⋯ , 𝑐 , 𝜇" , ⋯ , 𝜇, = / 𝑥 ' − 𝜇+ "
& Inertia
𝑚
'-"
" &
min 𝐽 𝑐 , ⋯ , 𝑐 , 𝜇" , ⋯ , 𝜇,
+ $ ,⋯,+ %
0$,⋯,0&
Slide credit: Andrew Ng
K-means Objective Function
Randomly initialize 𝐾 cluster centroids 𝜇!, 𝜇", ⋯ , 𝜇# ∈ ℝ$

Repeat{ Cluster assignment step

"
! "
1 $ '
𝐽 𝑐 ,⋯,𝑐 , 𝜇! , ⋯ , 𝜇# = ) 𝑥 − 𝜇& !
for 𝑖 = 1 to 𝑚 𝑚
$%!

𝑐 (&) ≔ index (from 1 to 𝐾) of cluster centroid

closest to 𝑥 (&)
Centroid update step
"
! "
1 $ '
𝐽 𝑐 ,⋯,𝑐 , 𝜇! , ⋯ , 𝜇# = ) 𝑥 − 𝜇& !
for 𝑘 = 1 to 𝐾 $%!
𝑚
𝜇( ≔ average (mean) of points assigned to cluster 𝑘
} Slide credit: Andrew Ng
What does it mean for
something to converge
to a local optima?

Initial settings will greatly

impact results!
• k is the number of clusters
• n is the number of data points
• d is the number of dimensions of each data point
• i is the number of iterations required for convergence

Complexity of K-Means
The time complexity of the K-means algorithm is generally
expressed as:

𝑶(𝒌 ∗ 𝒏 ∗ 𝒅 ∗ 𝒊)
Making sure the initialized centroids are “good” is critical to finding
quality local optima. Our purely random approach was wasteful
since it’s very possible that initial centroids start close together.

Getting smarter with Idea: Try to select a set of points farther away from each other.

Initialization k-means++ does a slightly smarter random initialization

1. Choose first cluster 𝜇! from the data uniformly at random
2. For the current set of centroids (starting with just 𝜇! ), compute
the distance between each datapoint and its closest centroid
K-Means ++ 3. Choose a new centroid from the remaining data points with
probability of 𝑥" being chosen proportional to 𝑑 𝑥" #
4. Repeat 2 and 3 until we have selected 𝑘 centroids
Pros
• Improves quality of local minima
• Faster convergence to local minima
K-Mans ++ Cons
• Computationally more expensive at beginning
when compared to simple random
initialization
Pros / Cons
Problems with k-means

• In real life, cluster assignments are not always clear cut

• E.g. The moon landing: Science? World News? Conspiracy?

• Because we minimize Euclidean distance, k-means

assumes all the clusters are spherical

• We can change this with weighted Euclidean distance

• Still assumes every cluster is the same shape/orientation

37
Failure Modes of • If we don’t meet the assumption of spherical clusters, we
k-means will get unexpected results

disparate cluster sizes overlapping clusters different

shaped/oriented
clusters

39
Quiz Time!
How to choose k?

No right answer! Depends on your application.

General, look for the “elbow” in the graph

cluster heterogeneity
Lowest possible
# of clusters k

Note: You will usually have to run k-means multiple times for each k
Gaussian Mixture Model (GMM)
Gaussian Mixture Model (GMM)

It turns out the univariate (one-dimensional) gaussian can be extended to the multivariate (multi-
dimensional) case. The form of a d-dimensional gaussian:

In higher dimensions, a Gaussian is fully specified by a mean vector 𝝁 and a d-by-d covariance
matrix 𝚺. Here |Σ| refers to the determinant of the covariance matrix e.g.

In two dimension, the Gaussian's parameters might look like this:

Mixture of
Gaussians
Mixture of Gaussians
Mixture of Gaussians

For example, consider the following GMM:

Clarification (Probability vs Likelihood)

Probability Density Function (PDF), or density of a continuous random variable, is a function whose value at
any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be
interpreted as providing a relative likelihood that the value of the random variable would be close to that sample.*

𝒇 𝒙 : a (continuous) probability density function

if X is continuous, the probability that X takes on any specific value x is 0

In a nutshell: Probability refers to the chance that a particular

outcome occurs based on the values of parameters in a model.
Likelihood refers to how well a sample provides support for
particular values of a parameter in a model.

* https://en.wikipedia.org/wiki/Probability_density_function
Expectation Maximization (EM)

E step:
• Randomly select mean and variance for each cluster, then figure out the
likelihood of each data point 𝑥𝑖 coming from each Gaussian.

M step:
• Use the probability assignments from the previous E step to re-estimate
the Gaussian’s mean and variance to better fit the data points.
Expectation
Maximization
(EM) Algorithm
Motivation

If we try to learn clusters in hierarchies, we can

• Avoid choosing the # of clusters beforehand
• Use dendrograms to help visualize different granularities of
clusters
• Allow us to use any distance metric
o K-means requires Euclidean distance
• Can often find more complex shapes than k-means
k-means

Mixture Models
Discovering
Shapes

Hierarchical Clustering
Divisive, a.k.a. top-down

Methods • Start with all the data in one big cluster and then
recursively split the data into smaller clusters
• Example: recursive k-means

Agglomerative, a.k.a. bottom-up:

• Start with each data point in its own cluster.

Merge clusters until all points are in one big
cluster.
• Example: single linkage
Divisive Clustering

Start with all the data in one cluster, and then run k-means to divide the data into smaller clusters.
Repeatedly run k-means on each cluster to make sub-clusters.
Divisive Clustering - Example

Wikipedia Wikipedia
Athletes Non-athletes

Wikipedia
Athletes Non-athletes
Baseball Soccer/ Musicians, Scholars, politicians,
Ice hockey artists, actors government officials
Choices to Make

For decisive clustering, you need to make the following choices:

• Which algorithm to use
• How many clusters per split
• When to split vs when to stop
• Max cluster size
Number of points in cluster falls below threshold
• Max cluster radius
distance to furthest point falls below threshold
• Specified # of clusters
split until pre-specified # of clusters is reached
Algorithm at a glance
• Initialize each point in its own cluster

Agglomerative • Define a distance metric between clusters

Clustering • While there is more than one cluster

• Merge the two closest clusters

58
Step 1

1. Initialize each point to be its own cluster

Step 2

2. Define a distance metric between clusters

Single Linkage
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝐶!, 𝐶" = min 𝑑 𝑥& , 𝑥4
0" ∈2# ,0$ ∈2%

60
This formula means we are defining the distance between two clusters as the
smallest distance between any pair of points between the clusters.
Step 3 • Merge closest pair of clusters

3. Merge the two closest clusters

61
Step 3 • Merge closest pair of clusters

3. Merge the two closest clusters

62
Repeat
4. Repeat step 3 until all points are in one cluster

63
Notice that the height of the
Repeat dendrogram is growing as we group
4. Repeat step 3 until all pointspoints
are infarther
one from
cluster
each other
Repeat
Repeat

• Looking at the dendrogram, we can see there is

a bit of an outlier!
4. Repeat step 3 until all points are i

• Can tell by seeing a point join a

cluster with a really large distance.

66
4. Repeat step 3 until all points are in on

Repeat
The tall links in the dendrogram
show us we are merging clusters
that are far away from each other

67
Final result after merging all
Repeat clusters
4. Repeat step 3 until all points are in one cluster

68
Final Result
Cut Dendrogram

• Choose a distance 𝐷 ∗ to “cut” the

dendrogram
• Use the largest clusters with
distance < 𝐷 ∗
• Usually ignore the idea of the
nested clusters after cutting

70
Cut Dendrogram • Every branch that crosses 𝐷 ∗ becomes its own cluster

71
• Week 5 – Nearest Neighbor Methods &
Dimensionality Reduction (PCA)

Coming up
Next…

• Homework #2 due May 31 (@ 7pm Pacific Time)

I’ll introduce course projects in Week 6
• Team Formation due June 14
• Course Survey due June 14
Lab Session:
Clustering with K-Means,
GMM, Agglomerative
Questions?

Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
53 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Ch12 Unsupervised Learning
No ratings yet
Ch12 Unsupervised Learning
58 pages
Engenius Report
No ratings yet
Engenius Report
50 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
Piyush Data Science 3
No ratings yet
Piyush Data Science 3
26 pages
DMDW Case Study Finished
No ratings yet
DMDW Case Study Finished
28 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Data Analysis - Groups - INCOMPLETE
No ratings yet
Data Analysis - Groups - INCOMPLETE
24 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Clustering
No ratings yet
Clustering
24 pages
Studying individual Development in An interindividual Context A Person oriented Approach - 1st Edition All Sections Download
100% (10)
Studying individual Development in An interindividual Context A Person oriented Approach - 1st Edition All Sections Download
14 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
2 - K-Mean
No ratings yet
2 - K-Mean
39 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Week 9
No ratings yet
Week 9
66 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering
No ratings yet
Clustering
28 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
Lecture 4.6 Unsupervised-learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-learning Clustering
60 pages
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
No ratings yet
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
28 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
ML IMP QUES 2
No ratings yet
ML IMP QUES 2
37 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
07-Clustering-2024
No ratings yet
07-Clustering-2024
51 pages
Clustering
No ratings yet
Clustering
84 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
MLT Quantum Aktu PDF
33% (3)
MLT Quantum Aktu PDF
160 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Unit 4
No ratings yet
Unit 4
74 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Unit-4
No ratings yet
Unit-4
46 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
Lec 05 Unsupervised-Kmeans
No ratings yet
Lec 05 Unsupervised-Kmeans
50 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
How to Re-Invent a Wheel A Tale of Science and Industry
No ratings yet
How to Re-Invent a Wheel A Tale of Science and Industry
27 pages
Heirarchical clustering
No ratings yet
Heirarchical clustering
22 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
7 pages
Flat Clustering & Hierarchical Clustering in I.R
No ratings yet
Flat Clustering & Hierarchical Clustering in I.R
13 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
ML UNIT3
No ratings yet
ML UNIT3
21 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
SpSS
No ratings yet
SpSS
9 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
3 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
k means clustering
No ratings yet
k means clustering
27 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Crisp DM - Crisp MLQ
No ratings yet
Crisp DM - Crisp MLQ
9 pages
Unsupervised Learning (1)
No ratings yet
Unsupervised Learning (1)
27 pages
Clustering
No ratings yet
Clustering
6 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
CSC 501 mid term 2-Assignment
No ratings yet
CSC 501 mid term 2-Assignment
2 pages
Lattin Et Al - Analyzing Multivariate Data - 281-283
No ratings yet
Lattin Et Al - Analyzing Multivariate Data - 281-283
3 pages
k-means
No ratings yet
k-means
25 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Data Mining (Gtu Sem-6)002
No ratings yet
Data Mining (Gtu Sem-6)002
5 pages
unit V
No ratings yet
unit V
67 pages
Data Mining and Data Warehousing 2023
No ratings yet
Data Mining and Data Warehousing 2023
2 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet