Expt 8 Divisive Algorithm Based On MST 121A1091
Expt 8 Divisive Algorithm Based On MST 121A1091
Expt 8 Divisive Algorithm Based On MST 121A1091
Theory:
Introduction and Concept:
Divisive Clustering is a top-down hierarchical clustering algorithm, where the
entire dataset starts as a single cluster and is iteratively split into smaller clusters until
specific criteria are met. Graph-based clustering leverages the relationship between
data points by representing them as nodes connected by weighted edges, and clusters
are formed by splitting the graph. Using a Minimum Spanning Tree (MST) for
divisive clustering ensures that we partition the data by removing the most significant
edges, breaking the dataset along its natural groupings.
Algorithm – Divisive Clustering using MST:
1. Represent the dataset as a graph, with nodes as data points and edges
weighted by the distance (e.g., Euclidean distance) between them.
2. Compute the Minimum Spanning Tree (MST) using algorithms like Prim’s
or Kruskal’s to connect all nodes with minimal edge weight.
3. Identify and remove the largest edge in the MST, breaking the graph into two
clusters.
4. Repeat the process recursively on each new subgraph until the required
number of clusters is obtained or stopping criteria are satisfied.
Python Implementation:
1. Import necessary libraries: networkx (for graph and MST operations) and
scipy (for distance computation).
2. Create a graph using the dataset and compute the MST.
3. Identify the edge with the largest weight and remove it to split the graph.
4. Continue splitting iteratively to form clusters, and visualize or evaluate the
clusters formed.
import networkx as nx
import scipy.spatial.distance as distance
import numpy as np
import matplotlib.pyplot as plt
Args:
data: A 2D NumPy array containing the data points.
num_clusters: The desired number of clusters.
Returns:
A list of clusters, where each cluster is a list of data point indices.
"""
# Example usage
data = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
num_clusters = 3
clusters = divisive_clustering_mst(data, num_clusters)
print("Clusters:", clusters)
# Visualization
def plot_clusters(data, clusters):
plt.figure(figsize=(8, 6))
colors = ['r', 'g', 'b', 'y', 'c', 'm']
plot_clusters(data, clusters)
Conclusion:
⚫ Divisive clustering using MST provides an effective way to cluster data by
capturing the natural structure of the graph.
⚫ This approach works well when data points form clusters with varying densities
and shapes. Although computationally intensive for large datasets, MST-based
clustering ensures meaningful partitions that are difficult to achieve with
traditional distance-based methods.