Image Clustering: Prof. Dr. Rafiqul Islam Department of CSE

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Image Clustering

Prof. Dr. Rafiqul Islam


Department of CSE
Introduction to Clustering
• Clustering: the process of grouping a set of
objects into classes of similar objects
– Documents within a cluster should be similar.
– Documents from different clusters should be dissimilar.

8/12/2020 Dr. Rafiqul Islam 2


Ch. 16

Introduction to Clustering

• How would
you design an
algorithm for
finding the
three clusters
in this case?

8/12/2020 Dr. Rafiqul Islam 3


Clustering methods
• Centroid based methods
• Connectivity based methods
• Distribution based methods
• Density based methods

8/12/2020 Dr. Rafiqul Islam 4


Clustering Method : K-means

• K-Means clustering algorithm is an unsupervised


algorithm and it is used to segment the interest area
from the background.
• It clusters, or partitions the given data into K-clusters
or parts based on the K-centroids.
• The algorithm is used when you have unlabeled
data(i.e. data without defined categories or groups).
• The goal is to find certain groups based on some kind
of similarity in the data with the number of groups
represented by K.

8/12/2020 Dr. Rafiqul Islam 5


Clustering Method : K-means

• Steps in K-Means algorithm:


1. Choose the number of clusters K.
2. Select at random K points, the centroids (not
necessarily from your dataset).
3. Assign each data point to the closest centroid →
that forms K clusters.
4. Compute and place the new centroid of each cluster.
5. Reassign each data point to the new closest
centroid. If any reassignment. took place, go to step 4,
otherwise, the model is ready.

8/12/2020 Dr. Rafiqul Islam 6


Clustering Method : K-means

• Use the k-means algorithm and Euclidean distance to


cluster the following 8 examples into 3 clusters:
• A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5),
A6=(6,4), A7=(1,2), A8=(4,9).
• The initial seeds (centers of each cluster) are A1, A4
and A7. Run the k-means algorithm

8/12/2020 Dr. Rafiqul Islam 7


Clustering Method: K-means
D(A1, Seed1) = 0 D(A2, Seed1) = D(A2, D(A3, Seed1) =
D(A1, Seed2) = Seed2) = D(A3, Seed2) =
D(A1, Seed3) = D(A2, Seed3) = D(A3, Seed3) =
D(A4, Seed1) = D(A5, Seed1) = D(A6, Seed1) =
D(A4, Seed2) = D(A5, Seed2) = D(A6, Seed2) =
D(A4, Seed3) = D(A5, Seed3) = D(A6, Seed3) =
D(A7, Seed1) = D(A8, Seed1) =
D(A7, Seed2) = D(A8, Seed2) =
D(A7, Seed3) = D(A8, Seed3) =
Cluster 01: Cluster 02: Cluster 03:
A1 A3, A4, A5, A6, A8 A2, A7
New Center: (2, 10) New Center: (6, 6) New Center: (1.5, 3.5)

8/12/2020 Dr. Rafiqul Islam 8


Clustering Method: K-means

8/12/2020 Dr. Rafiqul Islam 9


Clustering Method: K-means

8/12/2020 Dr. Rafiqul Islam 10


Nearest Neighbor clustering
• The k-nearest neighbors algorithm (k-NN) is a non-
parametric method proposed by Thomas Cover used
for classification.
• In k-NN classification, the output is a class
membership. An object is classified by a plurality
vote of its neighbors, with the object being assigned
to the class most common among its k nearest
neighbors (k is a positive integer, typically small).
• If k = 1, then the object is simply assigned to the
class of that single nearest neighbor.

8/12/2020 Dr. Rafiqul Islam 11


Nearest Neighbor clustering
• Initialize K to your chosen number of neighbors
• For each example in the data
– Calculate the distance between the query example and the
current example from the data.
– Add the distance and the index of the example to an
ordered collection
• Sort the ordered collection of distances and indices
from smallest to largest (in ascending order) by the
distances
• Pick the first K entries from the sorted collection
• Get the labels of the selected K entries

8/12/2020 Dr. Rafiqul Islam 12


Nearest Neighbor clustering
• Use the Nearest Neighbor clustering algorithm
and Euclidean distance to cluster the examples
from the
• previous exercise: A1=(2,10), A2=(2,5),
A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4),
A7=(1,2), A8=(4,9).
• Suppose that the threshold value (t) is 4.

8/12/2020 Dr. Rafiqul Islam 13


Nearest Neighbor clustering

8/12/2020 Dr. Rafiqul Islam 14


Nearest Neighbor clustering

8/12/2020 Dr. Rafiqul Islam 15


DB Scan Clustering
• Density-Based Clustering refers to unsupervised
learning methods that identify distinctive groups/
clusters in the data, based on the idea that a cluster in
data space is a contiguous region of high point density,
separated from other such clusters by contiguous
regions of low point density.
• Density-Based Spatial Clustering of Applications with
Noise (DBSCAN) is a base algorithm for density-based
clustering.
• It can discover clusters of different shapes and sizes
from a large amount of data, which is containing noise
and outliers.

8/12/2020 Dr. Rafiqul Islam 16


DB Scan Clustering
• The DBSCAN algorithm uses two parameters:
• minPts: The minimum number of points (a
threshold) clustered together for a region to be
considered dense.
• eps (ε): A distance measure that will be used to
locate the points in the neighborhood of any
point.

8/12/2020 Dr. Rafiqul Islam 17


DB Scan Clustering
• These parameters can be understood if we explore two
concepts called Density Reachability and Density
Connectivity.
• Reachability in terms of density establishes a point to
be reachable from another if it lies within a particular
distance (eps) from it.
• Connectivity, on the other hand, involves a transitivity
based chaining-approach to determine whether points
are located in a particular cluster. For example, p and q
points could be connected if p->r->s->t->q, where a->b
means b is in the neighborhood of a.

8/12/2020 Dr. Rafiqul Islam 18


DB Scan Clustering

Core — This is a point that has at least m points within distance n from itself.
Border — This is a point that has at least one Core point at a distance n.
Noise — This is a point that is neither a Core nor a Border. And it has less
than m points within distance n from itself.
8/12/2020 Dr. Rafiqul Islam 19
DB Scan Clustering
• Algorithmic steps for DBSCAN clustering
• The algorithm proceeds by arbitrarily picking up a
point in the dataset (until all points have been
visited).
• If there are at least ‘minPoint’ points within a radius
of ‘ε’ to the point then we consider all these points
to be part of the same cluster.
• The clusters are then expanded by recursively
repeating the neighborhood calculation for each
neighboring point

8/12/2020 Dr. Rafiqul Islam 20


DB Scan Clustering
• If Epsilon is 2 and minpoint is 2, what are the
clusters that DBScan would discover with the
following 8 examples:
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5),
A6=(6,4), A7=(1,2), A8=(4,9).
• Draw the 10 by 10 space and illustrate the
discovered clusters. What if Epsilon is
increased to 10 ?

8/12/2020 Dr. Rafiqul Islam 21


DB Scan Clustering
• The Euclidean distance matrix is

8/12/2020 Dr. Rafiqul Islam 22


DB Scan Clustering
• N2(A1)={}; N2(A2)={}; N2(A3)={A5, A6};
N2(A4)={A8}; N2(A5)={A3, A6};
N2(A6)={A3, A5}; N2(A7)={}; N2(A8)={A4}
• So A1, A2, and A7 are outliers, while we have
two clusters C1={A4, A8} and C2={A3, A5,
A6}

8/12/2020 Dr. Rafiqul Islam 23


DB Scan Clustering

8/12/2020 Dr. Rafiqul Islam 24


DB Scan Clustering
If Epsilon is 10 then the neighborhood of some points will increase:
A1 would join the cluster C1 and A2 would joint with A7 to form cluster C3={A2, A7}.

8/12/2020 Dr. Rafiqul Islam 25


Clustering Method
• Single Linkage Method
• Complete Linkage Method
• Hierarchical Method

8/12/2020 Dr. Rafiqul Islam 26

You might also like