0% found this document useful (0 votes)

90 views3 pages

Lecture - 7 - Practical - DBSCAN Clustering in Python

The document describes using DBSCAN clustering in Python to analyze a customer dataset from Kaggle containing customer age, gender, income, and spending score. It loads the dataset, selects relevant features to cluster, runs DBSCAN with hyperparameters to identify 5 clusters and 1 outlier, and visualizes the clusters in scatter plots of annual income vs spending score and age vs spending score.

Uploaded by

prerna sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views3 pages

Lecture - 7 - Practical - DBSCAN Clustering in Python

Uploaded by

prerna sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DBSCAN Clustering in Python

1. Randomly selecting any point p. It is also called core point if there are more data

points than minPts in a neighborhood.

2. It will use eps and minPts to identify all density reachable points.

3. It will create a cluster using eps and minPts if p is a core point.

4. It will move to the next data point if p is a border point. A data point is called a border

point if it has fewer points than minPts in the neighborhood.

5. The algorithm will continue until all points are visited.

We will be using the Deepnote notebook to run the example. It comes with pre-installed Python

packages, so we just have to import NumPy, pandas, seaborn, matplotlib, and sklearn.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN

We are using Mall Customer Segmentation Data

(https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-

python) from Kaggle. It contains customers' age, gender, income, and spending score. We will be

using these features to create various clusters.

First, we will load the dataset using pandas `read_csv`. Then, we will select three columns (‘Age',

'Annual Income (k$)', 'Spending Score (1-100)') to create the X_train dataframe.

df = pd.read_csv('Mall_Customers.csv')
X_train = df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']]

We will fit X_train on the DBSCAN algorithm with eps 12.5 and min_sample 4. After that, we will

create a DBSCAN_dataset from X_train and create a ‘Cluster’ column using clustering.labels_.
clustering = DBSCAN(eps=12.5, min_samples=4).fit(X_train)
DBSCAN_dataset = X_train.copy()
DBSCAN_dataset.loc[:,'Cluster'] = clustering.labels_

To visualize the distribution of clusters, we will use value_counts() and convert it into a

dataframe.

As you can see, we have 5 clusters and 1 outlier. The `0` cluster has the largest size with 112

rows.

DBSCAN_dataset.Cluster.value_counts().to_frame()

In this section, we will use the above information and visualize the scatter plot.

There are two plots: “Annual Income vs. Spending Score” and “Annual Income vs. Age.” The

clusters are defined by colors, and the outliers are defined as small black dots.

The visualization clearly shows how each customer is part of one of the 5 clusters, and we can

use this information to give high-end offers to customers with purple clusters and cheaper offers

to customers with dark green clusters.

outliers = DBSCAN_dataset[DBSCAN_dataset['Cluster']==-1]

fig2, (axes) = plt.subplots(1,2,figsize=(12,5))

sns.scatterplot('Annual Income (k$)', 'Spending Score (1-100)',

data=DBSCAN_dataset[DBSCAN_dataset['Cluster']!=-1],
hue='Cluster', ax=axes[0], palette='Set2', legend='full',
s=200)

sns.scatterplot('Age', 'Spending Score (1-100)',

data=DBSCAN_dataset[DBSCAN_dataset['Cluster']!=-1],

hue='Cluster', palette='Set2', ax=axes[1], legend='full',

s=200)

axes[0].scatter(outliers['Annual Income (k$)'], outliers['Spending Score

(1-100)'], s=10, label='outliers', c="k")

axes[1].scatter(outliers['Age'], outliers['Spending Score (1-100)'], s=10,

label='outliers', c="k")
axes[0].legend()
axes[1].legend()

plt.setp(axes[0].get_legend().get_texts(), fontsize='12')
plt.setp(axes[1].get_legend().get_texts(), fontsize='12')

plt.show()

Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
100% (19)
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
50 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
100% (5)
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
83 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
DBSCAN
No ratings yet
DBSCAN
30 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
2324 BigData Lab3
No ratings yet
2324 BigData Lab3
6 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
Practical 5
No ratings yet
Practical 5
6 pages
Assignment ....
No ratings yet
Assignment ....
8 pages
Mining and Visualising Real-World Data: About This Module
100% (1)
Mining and Visualising Real-World Data: About This Module
16 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Spark Lab
No ratings yet
Spark Lab
6 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
23CC554
No ratings yet
23CC554
10 pages
Chapter 5 CLUSTERING
No ratings yet
Chapter 5 CLUSTERING
36 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
IML Assignment5
No ratings yet
IML Assignment5
10 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 5
No ratings yet
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 5
3 pages
Data Enggineering
No ratings yet
Data Enggineering
16 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Intro Qugates
No ratings yet
Intro Qugates
4 pages
Hierarchical Clustering Mall Data
No ratings yet
Hierarchical Clustering Mall Data
2 pages
Clustering Part 2
No ratings yet
Clustering Part 2
9 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
BIRCH - DBSCAN (4) - JupyterLab
No ratings yet
BIRCH - DBSCAN (4) - JupyterLab
7 pages
Session 11 Hierarchical DBSCAN
No ratings yet
Session 11 Hierarchical DBSCAN
27 pages
Class Activity#7 Robert Skublen
No ratings yet
Class Activity#7 Robert Skublen
7 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Esam - DWM Lab 8
No ratings yet
Esam - DWM Lab 8
5 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Week 11 Assignment 11.1.2
No ratings yet
Week 11 Assignment 11.1.2
2 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Data Mining
No ratings yet
Data Mining
27 pages
Untitled Document-2-1-13-7-11.4
No ratings yet
Untitled Document-2-1-13-7-11.4
5 pages
Day 4
No ratings yet
Day 4
62 pages
Zara
No ratings yet
Zara
47 pages
Copy of ML Expected Question and Explanation of the 3 Pgm
No ratings yet
Copy of ML Expected Question and Explanation of the 3 Pgm
12 pages
ML Solution
No ratings yet
ML Solution
60 pages
Aiml Unit 3 4
No ratings yet
Aiml Unit 3 4
19 pages
Phase 2
No ratings yet
Phase 2
5 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
VL2024250504566 Ast03
No ratings yet
VL2024250504566 Ast03
2 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
Workshop Project Report
No ratings yet
Workshop Project Report
10 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
DMBI
No ratings yet
DMBI
16 pages
DBSCAN Clustering in ML - Density Based Clustering
No ratings yet
DBSCAN Clustering in ML - Density Based Clustering
5 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
EE - 353 - 769 A4 Unsupervised Learning
No ratings yet
EE - 353 - 769 A4 Unsupervised Learning
1 page
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages