Expt 8 Divisive Algorithm Based On MST 121A1091

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

PROGRAM NO.

8 DIVISIVE CLLSTERING ALGORITHM based on MST

Aim: To implement Divisive Clustering Algorithm based on Minimum Spanning


Tree (MST) in Python and understand the graph-based clustering approach.

Theory:
Introduction and Concept:
Divisive Clustering is a top-down hierarchical clustering algorithm, where the
entire dataset starts as a single cluster and is iteratively split into smaller clusters until
specific criteria are met. Graph-based clustering leverages the relationship between
data points by representing them as nodes connected by weighted edges, and clusters
are formed by splitting the graph. Using a Minimum Spanning Tree (MST) for
divisive clustering ensures that we partition the data by removing the most significant
edges, breaking the dataset along its natural groupings.
Algorithm – Divisive Clustering using MST:
1. Represent the dataset as a graph, with nodes as data points and edges
weighted by the distance (e.g., Euclidean distance) between them.
2. Compute the Minimum Spanning Tree (MST) using algorithms like Prim’s
or Kruskal’s to connect all nodes with minimal edge weight.
3. Identify and remove the largest edge in the MST, breaking the graph into two
clusters.
4. Repeat the process recursively on each new subgraph until the required
number of clusters is obtained or stopping criteria are satisfied.
Python Implementation:
1. Import necessary libraries: networkx (for graph and MST operations) and
scipy (for distance computation).
2. Create a graph using the dataset and compute the MST.
3. Identify the edge with the largest weight and remove it to split the graph.
4. Continue splitting iteratively to form clusters, and visualize or evaluate the
clusters formed.
import networkx as nx
import scipy.spatial.distance as distance
import numpy as np
import matplotlib.pyplot as plt

def divisive_clustering_mst(data, num_clusters):


"""
Performs divisive clustering using a Minimum Spanning Tree (MST).

Args:
data: A 2D NumPy array containing the data points.
num_clusters: The desired number of clusters.

Returns:
A list of clusters, where each cluster is a list of data point indices.
"""

# Create a graph with nodes representing data points


G = nx.Graph()
for i in range(len(data)):
G.add_node(i)

# Compute distances between data points


distances = distance.cdist(data, data)

# Add edges with weights representing distances


for i in range(len(data)):
for j in range(i + 1, len(data)):
G.add_edge(i, j, weight=distances[i, j])

# Compute the Minimum Spanning Tree (MST)


mst = nx.minimum_spanning_tree(G)

# Split the graph iteratively to form clusters


clusters = [list(G.nodes())]
while len(clusters) < num_clusters:
# Find the edge with the largest weight in the MST
max_edge = max(mst.edges(data=True), key=lambda edge: edge[2]['weight'])
u, v, weight = max_edge

# Remove the edge to split the graph


mst.remove_edge(u, v)

# Create new clusters based on the disconnected components


new_clusters = []
for component in nx.connected_components(mst):
new_clusters.append(list(component))
clusters = new_clusters
return clusters

# Example usage
data = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
num_clusters = 3
clusters = divisive_clustering_mst(data, num_clusters)
print("Clusters:", clusters)

# Visualization
def plot_clusters(data, clusters):
plt.figure(figsize=(8, 6))
colors = ['r', 'g', 'b', 'y', 'c', 'm']

for cluster_idx, cluster in enumerate(clusters):


cluster_points = data[cluster]
plt.scatter(cluster_points[:, 0], cluster_points[:, 1], label=f"Cluster
{cluster_idx + 1}", c=colors[cluster_idx])

plt.title("Divisive Clustering using MST")


plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()

plot_clusters(data, clusters)
Conclusion:
⚫ Divisive clustering using MST provides an effective way to cluster data by
capturing the natural structure of the graph.
⚫ This approach works well when data points form clusters with varying densities
and shapes. Although computationally intensive for large datasets, MST-based
clustering ensures meaningful partitions that are difficult to achieve with
traditional distance-based methods.

You might also like