Data Mining Functionalities
Data Mining Functionalities
A hierarchical clustering method works by grouping data
objects into a hierarchy or “tree” of clusters.
Representing data objects in the form of a hierarchy is
useful for data summarization and visualization.
Agglomerative versus Divisive Hierarchical Clustering
A hierarchical clustering method can be either
agglomerative or divisive, depending on whether the
hierarchical decomposition is formed in a bottom-up
(merging) or top- down (splitting) fashion.
Agglomerative versus Divisive Hierarchical Clustering
An agglomerative hierarchical clustering method uses a bottom-up
strategy.
It typically starts by letting each object form its own cluster and
iteratively merges clusters into larger and larger clusters, until all the
objects are in a single cluster or certain termination conditions are
satisfied.
The single cluster becomes the hierarchy’s root.
For the merging step, it finds the two clusters that are closest to each
other (according to some similarity measure), and combines the two to
form one cluster.
Agglomerative versus Divisive Hierarchical Clustering
A divisive hierarchical clustering method employs a top-down
strategy.
It starts by placing all objects in one cluster, which is the hierarchy’s
root.
It then divides the root cluster into several smaller subclusters, and
recursively partitions those clusters into smaller ones.
The partitioning process continues until each cluster at the lowest
level is coherent enough—either containing only one object, or the
objects within a cluster are sufficiently similar to each other.
Agglomerative versus Divisive Hierarchical Clustering
Hierarchical Clustering
Clusters are merged based on the distance between them and to
calculate the distance between the clusters we have different types of
linkages.
Linkage Criteria:
It determines the distance between sets of observations as a function
of the pairwise distance between observations.
In Single Linkage, the distance between two clusters is the minimum
distance between members of the two clusters
In Complete Linkage, the distance between two clusters is the
maximum distance between members of the two clusters
In Average Linkage, the distance between two clusters is the
average of all distances between members of the two clusters
In Centroid Linkage, the distance between two clusters is is the
distance between their centroids
Hierarchical Clustering
Hierarchical Clustering
Hierarchical Clustering
Objective : For the one dimensional data set {7,10,20,28,35}, perform
hierarchical clustering and plot the dendogram to visualize it.
Let’s solve the problem by hand using both the types of agglomerative
hierarchical clustering :
Single Linkage : In single link hierarchical clustering, we merge in each
step the two clusters, whose two closest members have the smallest
distance.
Complete Linkage : In complete link hierarchical clustering, we merge in
the members of the clusters in each step, which provide the smallest
maximum pairwise distance.
Fundamental Working Of KNN(K-
Nearest Neighbors)
KNN is a supervised machine learning algorithm.
Assume that we have a dataset in which the data points are
classified into 2 categories.
Fundamental Working Of KNN(K-
Nearest Neighbors)
✔ Now, when the new data point comes in, the KNN algorithm
will predict which category or group does the new data point
belongs to.
✔ Once our model is able to classify the new data points as category A or
category B, we can say that our model is ready to make the predictions.