0% found this document useful (0 votes)

7 views

DBSCAN Clustering in ML _ Density Based Clustering

DBSCAN is a density-based clustering algorithm that identifies clusters of arbitrary shapes and effectively handles noise and outliers, unlike K-Means which assumes spherical clusters. Key parameters include 'eps' for neighborhood radius and 'MinPts' for minimum points to form a dense region. DBSCAN categorizes points as core, border, or noise and uses a series of steps to form clusters based on density connectivity.

Uploaded by

GOBINDA PRADHAN 076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

DBSCAN Clustering in ML _ Density Based Clustering

Uploaded by

GOBINDA PRADHAN 076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DBSCAN Clustering in ML | Density based clustering

Last Updated : 29 Jan, 2025

DBSCAN is a density-based clustering algorithm that groups data points that are closely
packed together and marks outliers as noise based on their density in the feature space. It
identifies clusters as dense regions in the data space, separated by areas of lower density.

Unlike K-Means or hierarchical clustering, which assume clusters are compact and spherical,
DBSCAN excels in handling real-world data irregularities such as:

Arbitrary-Shaped Clusters: Clusters can take any shape, not just circular or convex.
Noise and Outliers: It effectively identifies and handles noise points without assigning them
to any cluster.

DBSCAN Clustering in ML | Density based clustering

The figure above shows a data set with clustering algorithms: K-Means and
Hierarchical handling compact, spherical clusters with varying noise tolerance, while
DBSCAN manages arbitrary-shaped clusters and excels in noise handling.

Key Parameters in DBSCAN

1. eps: This defines the radius of the neighborhood around a data point.

If the distance between two points is less than or equal to eps, they are considered neighbors.
Choosing the right eps is crucial:

If eps is too small, most points will be classified as noise.

If eps is too large, clusters may merge, and the algorithm may fail to distinguish between
them.

A common method to determine eps is by analyzing the k-distance graph.

2. MinPts: This is the minimum number of points required within the eps radius to form a
dense region.

A general rule of thumb is to set MinPts >= D+1, where D is the number of dimensions in the
dataset. For most cases, a minimum value of MinPts = 3 is recommended.

How Does DBSCAN Work?

DBSCAN works by categorizing data points into three types:
1. core points, which have a sufficient number of neighbors within a specified radius (eplison)
2. border points, which are near core points but lack enough neighbors to be core points
themselves
3. noise points, which do not belong to any cluster.

By iteratively expanding clusters from core points and connecting density-reachable points,
DBSCAN forms clusters without relying on rigid assumptions about their shape or size.

Steps in the DBSCAN Algorithm

1. Identify Core Points: For each point in the dataset, count the number of points within
its eps neighborhood. If the count meets or exceeds MinPts, mark the point as a core point.
2. Form Clusters: For each core point that is not already assigned to a cluster, create a new
cluster. Recursively find all density-connected points (points within the eps radius of the
core point) and add them to the cluster.
3. Density Connectivity: Two points, a and b, are density-connected if there exists a chain of
points where each point is within the eps radius of the next, and at least one point in the
chain is a core point. This chaining process ensures that all points in a cluster are connected
through a series of dense regions.
4. Label Noise Points: After processing all points, any point that does not belong to a cluster is
labeled as noise.

Pseudocode For DBSCAN Clustering Algorithm

DBSCAN(dataset, eps, MinPts){

# cluster index
C = 1
for each unvisited point p in dataset {
mark p as visited
# find neighbors
Neighbors N = find the neighboring points of p

if |N|>=MinPts:
N = N U N'
if p' is not a member of any cluster:
add p' to cluster C
}

Implementation Of DBSCAN Algorithm In Python

Here, we’ll use the Python library sklearn to compute DBSCAN. We’ll also use the
matplotlib.pyplot library for visualizing clusters.

Import Libraries

import matplotlib.pyplot as plt

import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn import datasets

Prepare dataset

We will create a dataset using sklearn for modeling. We make_blob for creating the dataset

# Load data in X
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.50, random_state=0)

Modeling The Data Using DBSCAN

db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.

n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

# Plot result
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = ['y', 'b', 'g', 'r']
print(colors)
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = 'k'

class_member_mask = (labels == k)

xy = X[class_member_mask & core_samples_mask]

plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k',
markersize=6)

xy = X[class_member_mask & ~core_samples_mask]

plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k',
markersize=6)

plt.title('number of clusters: %d' % n_clusters_)

plt.show()

Output:

Cluster of dataset

Evaluation Metrics For DBSCAN Algorithm In Machine Learning

We will use the Silhouette score and Adjusted rand score for evaluating clustering algorithms.

Silhouette’s score is in the range of -1 to 1. A score near 1 denotes the best meaning that the
data point i is very compact within the cluster to which it belongs and far away from the
other clusters. The worst value is -1. Values near 0 denote overlapping clusters.
Absolute Rand Score is in the range of 0 to 1. More than 0.9 denotes excellent cluster
recovery, and above 0.8 is a good recovery. Less than 0.5 is considered to be poor recovery.

# evaluation metrics
sc = metrics.silhouette_score(X, labels)
print("Silhouette Coefficient:%0.2f" % sc)
ari = adjusted_rand_score(y_true, labels)
print("Adjusted Rand Index: %0.2f" % ari)

Output:

Coefficient:0.13
Adjusted Rand Index: 0.31:

Black points represent outliers. By changing the eps and the MinPts, we can change the cluster
configuration. Now the question that should be raised is —

When Should We Use DBSCAN Over K-Means In Clustering Analysis?

DBSCAN(Density-Based Spatial Clustering of Applications with Noise) and K-Means are both
clustering algorithms that group together data that have the same characteristic. However, They
work on different principles and are suitable for different types of data. We prefer to use
DBSCAN when the data is not spherical in shape or the number of classes is not known
beforehand.

DBSCAN K-Means

K-Means is very sensitive to the number of

In DBSCAN we need not specify the number
clusters so it
of clusters.
need to specified

Clusters formed in K-Means are spherical

Clusters formed in DBSCAN can be of any or
arbitrary shape.
convex in shape

K-Means does not work well with outliers

data. Outliers
DBSCAN can work well with datasets having
noise and outliers can skew the clusters in K-Means to a very
large extent.

In K-Means only one parameter is required

In DBSCAN two parameters are required for is for training
training the Model
the model

As it can identify clusters of arbitrary shapes and effectively handle noise. K-Means, on the
other hand, is better suited for data with well-defined, spherical clusters and is less effective
with noise or complex cluster structures. More differences between these two algorithms can be
found here

Math375 Final Exam Sample4 PDF
No ratings yet
Math375 Final Exam Sample4 PDF
12 pages
DB Scan
No ratings yet
DB Scan
7 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
ML Exp 9
No ratings yet
ML Exp 9
5 pages
SE_DEMO
No ratings yet
SE_DEMO
29 pages
LAB MANUAL DBSCAN
No ratings yet
LAB MANUAL DBSCAN
6 pages
DBSCAN clustering
No ratings yet
DBSCAN clustering
2 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
DB SCAN unit 4
No ratings yet
DB SCAN unit 4
6 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
Data mining
No ratings yet
Data mining
3 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
DBSCAN
No ratings yet
DBSCAN
23 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
DBSCAN.docx
No ratings yet
DBSCAN.docx
7 pages
DIP Lab 13 DBSCAN Clustering
No ratings yet
DIP Lab 13 DBSCAN Clustering
6 pages
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
No ratings yet
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
21 pages
ML Exp 7
No ratings yet
ML Exp 7
6 pages
Density ML
No ratings yet
Density ML
51 pages
DM Lect 8_Clustering - DBSCAN
No ratings yet
DM Lect 8_Clustering - DBSCAN
22 pages
DBSCAN AND OPTICS
No ratings yet
DBSCAN AND OPTICS
28 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
ads exp 7_labmanual
No ratings yet
ads exp 7_labmanual
3 pages
DBSCAN Presentation
No ratings yet
DBSCAN Presentation
10 pages
DBSCAN Clustering Python
No ratings yet
DBSCAN Clustering Python
4 pages
Density Based
No ratings yet
Density Based
27 pages
Unsupervised Learning Clustering II
No ratings yet
Unsupervised Learning Clustering II
17 pages
DBSCAN
No ratings yet
DBSCAN
30 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
USL3
No ratings yet
USL3
19 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
Esam - DWM Lab 8
No ratings yet
Esam - DWM Lab 8
5 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
22 pages
A Fast DBSCAN Algorithm for Big Data Based on Efficient Density
No ratings yet
A Fast DBSCAN Algorithm for Big Data Based on Efficient Density
12 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
No ratings yet
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
6 pages
Module 10
No ratings yet
Module 10
59 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
An Improvement of DBSCAN Algorithm To Analyze Cluster For Large Dataset
No ratings yet
An Improvement of DBSCAN Algorithm To Analyze Cluster For Large Dataset
5 pages
Dbscan: Densiy Based Scan Algorithm
No ratings yet
Dbscan: Densiy Based Scan Algorithm
8 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Choosing DBSCAN Parameters
No ratings yet
Choosing DBSCAN Parameters
11 pages
ciea_assignment_3
No ratings yet
ciea_assignment_3
3 pages
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
No ratings yet
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
5 pages
VDBSCAN
No ratings yet
VDBSCAN
4 pages
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
No ratings yet
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
27 pages
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
No ratings yet
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
30 pages
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
No ratings yet
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
28 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Bde Dbscan
No ratings yet
Bde Dbscan
11 pages
Week 11 Assignment 11.1.2
No ratings yet
Week 11 Assignment 11.1.2
2 pages
1730702231_ML14_DBSCAN
No ratings yet
1730702231_ML14_DBSCAN
10 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Eigenvalues & Eigenvectors_ Definition, Formula, Examples
No ratings yet
Eigenvalues & Eigenvectors_ Definition, Formula, Examples
20 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Convolutional Neural Networks 2 Now
No ratings yet
Convolutional Neural Networks 2 Now
6 pages
Chapter 2 Introduction To Data Structure
No ratings yet
Chapter 2 Introduction To Data Structure
33 pages
Data Warehousing and Data Mining UNIT - 04: A Lazy Learner Simply Stores The Training Data and
No ratings yet
Data Warehousing and Data Mining UNIT - 04: A Lazy Learner Simply Stores The Training Data and
3 pages
Meesho Questions
No ratings yet
Meesho Questions
24 pages
BITS Pilani: Machine Learning (IS ZC464)
No ratings yet
BITS Pilani: Machine Learning (IS ZC464)
26 pages
Ai Problem Solving
No ratings yet
Ai Problem Solving
54 pages
Unit 2 - Stack
No ratings yet
Unit 2 - Stack
31 pages
B+ Tree Insertion & Deletion (Perfect)
No ratings yet
B+ Tree Insertion & Deletion (Perfect)
37 pages
Data Structures & Algorithms in Java
No ratings yet
Data Structures & Algorithms in Java
61 pages
II Sem DS Unit I
No ratings yet
II Sem DS Unit I
55 pages
Data Structur and Algoritham
No ratings yet
Data Structur and Algoritham
3 pages
Advance Data Structures Through Java Labmanual
No ratings yet
Advance Data Structures Through Java Labmanual
75 pages
DP & Bitmask
No ratings yet
DP & Bitmask
11 pages
NOTES DataStructure Stacks 2022 23
No ratings yet
NOTES DataStructure Stacks 2022 23
8 pages
Noc20-Cs53 Week 04 Assignment 02
No ratings yet
Noc20-Cs53 Week 04 Assignment 02
4 pages
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
No ratings yet
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
13 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
19 pages
4.1) FP Growth Algorithm
No ratings yet
4.1) FP Growth Algorithm
26 pages
Complete_Arrow_Puzzle_Guide_Eaux_rev2-10
No ratings yet
Complete_Arrow_Puzzle_Guide_Eaux_rev2-10
32 pages
Syllabus: Cs6202 - Programming and Data Structures I
No ratings yet
Syllabus: Cs6202 - Programming and Data Structures I
2 pages
Database File Organisation Lecture
No ratings yet
Database File Organisation Lecture
32 pages
80067
No ratings yet
80067
55 pages
BCSL 045 PDF
No ratings yet
BCSL 045 PDF
16 pages
American Express
No ratings yet
American Express
3 pages
Abdul Ahad Dsa Ass 1
No ratings yet
Abdul Ahad Dsa Ass 1
6 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Module 2
No ratings yet
Module 2
28 pages
M3L6 LN
No ratings yet
M3L6 LN
7 pages
An Improved Dynamic Programming Algorithm For Bitonic TSP: LI Jian
No ratings yet
An Improved Dynamic Programming Algorithm For Bitonic TSP: LI Jian
4 pages
1.write A Program To Reverse The Linked List. (Both Iterative and Recursive)
No ratings yet
1.write A Program To Reverse The Linked List. (Both Iterative and Recursive)
64 pages