0% found this document useful (0 votes)
10 views

Customer_Segmentation_Using_Hierarchical_Clustering (1)

Uploaded by

sivaiyyanar1232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Customer_Segmentation_Using_Hierarchical_Clustering (1)

Uploaded by

sivaiyyanar1232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 IEEE 9th International Conference for Convergence in Technology (I2CT)

Pune, India. Apr 5-7, 2024

Customer Segmentation Using Hierarchical


Clustering
Areeba Afzal1, Laiba Khan2, Muhammad Zunnurain Hussain3, Muhammad Zulkifl Hasan4, Muzzamil Mustafa5, Aqsa Khalid6,
Rimsha Awan7, Farhan Ashraf8, Zohaib Ahmed Khan9, Arslan Javaid10
1,2Department of Computer Engineering Information Technology University Lahore, Punjab, Pakistan
2024 IEEE 9th International Conference for Convergence in Technology (I2CT) | 979-8-3503-9447-4/24/$31.00 ©2024 IEEE | DOI: 10.1109/I2CT61223.2024.10543349

3Assistant Professor, Dept. of Computer Science, Bahria University Lahore Campus


4Department of Computer Science, Faculty of Information Technology, University of Central Punjab Lahore Pakistan
5Department of Computer Science, National College of Business Administration and Economics, Lahore, Pakistan
6Information Technology University Lahore, Pakistan
7Department of Computer Science National College of Business Administration and Economics, Lahore, Pakistan
8Dept. of Computer Science, Bahria University Lahore Campus
9Department of Computer Science National College of Business Administration & Economics, Lahore, Pakistan
10Department of Computer Science National College of Business Administration and Economics, Lahore, Pakistan
1bsce20017@itu.edu.pk, 2bsce20035@itu.edu.pk, 3Zunnurain.bulc@bahria.edu.pk, 4Zulkifl.hasan@ucp.edu.pk,
5muzzamil.mustafa@umt.edu.pk , 6 msds19046@itu.edu.pk, 7 rimshaawan.225@gmail.com , 8farhanashrafali30@gmail.com, 9

zohaibkhanmcitp@gmail.com, 10Arslanravian97@gmail.com

Abstract— In the dynamic landscape of retail, the utilization of hierarchical clustering, an advanced data-
understanding customer behavior is paramount for businesses driven approach, this study aims to unveil valuable insights
seeking to optimize marketing strategies and enhance the that can revolutionize how businesses interact with their mall
shopping experience. This study explores the utilization of customers.
hierarchical clustering techniques for mall customer
segmentation, with a focus on the paper titled ’MALL The essence of this research lies in its potential to uncover
CUSTOMER SEGMENTATION USING MACHINE hidden patterns and preferences governing customer choices
LEARNING’ as the benchmark. Our dataset encompasses a within the mall environment. As businesses grapple with
diverse range of mall customers, spanning demographics and fierce competition in the retail arena, the ability to segment
behavioral attributes. Hierarchical clustering systematically customers effectively and tailor marketing strategies to
groups customers into clusters, revealing distinct segments specific segments holds the promise of a substantial
within the mall’s customer base. A comprehensive analysis of advantage. This paper not only introduces the concept of
these clusters unveils profound insights into customer hierarchical clustering as a potent tool for customer
tendencies, preferences, and purchasing habits. These insights segmentation but also explores its adaptability to diverse mall
form a solid foundation for tailored marketing campaigns, customer datasets.
personalized recommendations, and resource allocation within
the mall. The study contributes significantly to customer In the pages that follow, we delve into the methodology,
analytics, providing retailers with a powerful tool to gain a results, and implications of applying hierarchical clustering to
competitive edge in the retail sector. By leveraging hierarchical mall customer data. By doing so, we aim to provide businesses
clustering for mall customer segmentation, businesses can in the retail industry with a practical resource to enhance
enhance customer satisfaction, drive sales, and foster lasting customer satisfaction, drive sales, and foster enduring
customer relationships. This paper underscores the importance customer relationships in an ever-evolving retail landscape.
of data-driven method- ologies in understanding customer
behavior and offers a practical framework for businesses to II. BACKGROUND AND RELATED WORK
harness the potential of hierarchical clustering for strategic
decision-making in the retail industry. A. Background
Understanding customer behavior through segmentation is
Keywords: Hierarchical Clustering, Customer Segmentation, a fundamental strategy across various industries. This process
Data Mining Techniques, Market Segmentation, Cluster Analysis, involves categorizing customers into distinct groups based on
Agglomerative Clustering Dendrogram, Machine Learning in shared characteristics, enabling businesses to tailor their
Marketing, Consumer Behavior Analysis, Multivariate Data
approaches to meet specific needs and preferences. Malls and
Analysis, Customer Data Clustering, Behavioral Segmentation,
shopping centers serve as vibrant hubs attracting a diverse
Marketing Strategy, Customer Profiling, Predictive Analytics in
CRM
range of customers, each with their unique demographics,
shopping habits, and preferences. Yet, the application of
I. INTRODUCTION hierarchical clustering to segment mall customers remains a
relatively unexplored area. This study aims to address this gap
Understanding customer behavior is a fundamental pursuit by evaluating the suitability and effectiveness of hierarchical
for businesses in the dynamic retail sector. The ability to clustering in segmenting mall customers. The insights gained
dissect and categorize customers into distinct groups, known will have implications not only for retail but also for various
as customer segmentation, plays a pivotal role in shaping sectors seeking to enhance their understanding of diverse
marketing strategies and enhancing shopping experiences. customer bases and optimize their strategies accordingly.
This research project ventures into the application of
hierarchical clustering techniques within the context of mall B. Related Work
customers, a diverse and multifaceted consumer demographic. Customer segmentation and clustering techniques have
Malls attract a wide range of individuals, each with their been extensively studied across various domains, providing
unique prefer- ences, demographics, and behaviors. Through

979-8-3503-9447-4/24/$31.00 ©2024 IEEE 1


Authorized licensed use limited to: Zhejiang University. Downloaded on November 15,2024 at 15:06:04 UTC from IEEE Xplore. Restrictions apply.
valuable insights into understanding customer behavior and Post-clustering, a detailed analysis of each cluster’s
enhancing business strategies. In the context of customer characteristics was conducted. A scatter plot visually
segmentation: portrayed the distribution of clusters in a two-dimensional
space, plotting
Traditional Segmentation Methods: Traditional methods,
including demographic, geographic, and psychographic ’Age’ against ’Annual Income (k).’ The derived cluster
segmentation, have long been used in marketing. These labels were appended to the original dataset, saved
approaches categorize customers based on factors like age, as ’cluster.csv,’ along with additional details like the
location, in- come, and lifestyle. While informative, they may maximum customer ID.
oversimplify the complex nature of customer behavior.
In summary, this methodology signifies a transition to
K-Means Clustering: K-means clustering is a widely hier- archical clustering for more nuanced customer
employed technique that groups data points into clusters based segmentation, capturing intricate hierarchical relationships
on similarity. In the field of customer segmentation, K-means within the data.
has been used to identify distinct customer groups. However,
it assumes spherical clusters and requires specifying the IV. EASE OF USE
number of clusters beforehand. Here are the key considerations to ensure accessibility and
DBSCAN and Density-Based Clustering: Density-Based simplicity:
Spatial Clustering of Applications with Noise (DBSCAN) Data Collection and Preprocessing: Streamline data col-
identifies clusters based on data density. It is particularly lection procedures to minimize errors and inconsistencies.
useful for discovering irregularly shaped clusters. In customer Develop clear guidelines for data preprocessing, including
segmentation, DBSCAN has been applied to uncover less- handling missing values and outliers, and provide automated
defined customer groups. tools or scripts for data cleaning.
Hierarchical Clustering in Customer Segmentation: Hierarchical Clustering Tools: Utilize user-friendly
Hierarchical clustering, unlike K-means and DBSCAN, software or programming libraries for hierarchical clustering,
creates a hierarchical structure of clusters, offering a visual offering an intuitive interface for researchers or practitioners
representation of the relationships between clusters. Its to apply clustering algorithms without requiring extensive
adaptability to various data shapes and its ability to capture coding ex- pertise.
hierarchical relationships make it a promising approach for
customer segmentation. Visualization: Implement visualization tools that generate
dendrograms and cluster visualizations in a straightforward
Customer Segmentation in Retail: Numerous studies have manner. These visual representations should be interpretable
explored customer segmentation within the retail sector, for non-technical stakeholders.
focusing on different clustering techniques and data sources.
However, the specific application of hierarchical clustering to Parameter Tuning: Simplify the process of parameter
mall customer data remains an underexplored avenue. tuning for hierarchical clustering algorithms. Provide clear
guidance on choosing appropriate distance metrics and
Machine Learning and Predictive Analytics: Beyond linkage methods, along with automated tools to assist in
clustering, machine learning techniques, such as decision selection.
trees, random forests, and neural networks, have been applied
to predict customer behavior and preferences. These methods Cluster Interpretation: Develop user-friendly dashboards
offer predictive capabilities to inform targeted marketing and or reports that enable easy interpretation of cluster results.
personalized recommendations. Summarize the characteristics and behaviors of each cluster in
a comprehensible format.
III. METHODOLOGY
Validation Metrics: Include automated validation metrics
This study advances customer segmentation from k- within the clustering process to assist users in assessing cluster
means to hierarchical clustering. The dataset, sourced quality. Provide explanations and guidelines on how to
from ”Mall Customers.csv,” underwent preliminary interpret these metrics.
exploration. Univariate clustering was initiated based
on ’Annual Income (k)’ using Agglomerative Clustering with Documentation: Offer comprehensive documentation that
three clusters. A dendrogram visually captured the hierarchical covers the entire research process, from data collection to
structure of in- come segments. Building on this, bivariate interpretation of results. Include step-by-step tutorials, code
clustering incorporated both ’Annual Income (k)’ samples, and examples for reference.
and ’Spending Score (1-100)’ with five clusters. The User Support: Establish a support system to assist users
Agglomerative Clustering algorithm provided insights into with questions or issues they may encounter during the
customer segments considering both income and spending. clustering process. Provide contact information for assistance
The resulting hierarchical relationships were depicted through when needed.
a dendrogram.
Training and Workshops: Consider organizing training
Multivariate clustering ensued, integrating ’Age,’ ’Annual sessions or workshops to educate users on the use of
Income (k),’ ’Spending Score (1-100),’ and ’Genre Male.’ hierarchical clustering for mall customer segmentation. These
Categorical features were numerically transformed, and data sessions can help bridge knowledge gaps and enhance
standardized with StandardScaler. Hierarchical clustering usability.
with five clusters and a dendrogram visualization deepened
the understanding of complex relationships. Feedback Loop: Encourage users to provide feedback on
the tools and processes used for hierarchical clustering.

2
Authorized licensed use limited to: Zhejiang University. Downloaded on November 15,2024 at 15:06:04 UTC from IEEE Xplore. Restrictions apply.
Regularly update and improve the resources based on user
input to enhance ease of use continually.
V. SYSTEM DIAGRAM AND FLOWCHART
Here Fig.1 demonstrates the Flowchart. This flow
represents the process of customer segmentation using various
Hierarchical Clustering methods. It starts by preparing the
data and then employs different clustering techniques
(univariate, bivariate, and multivariate) to group customers
based on different features. The flow then visualizes these
clusters and evaluates their quality using the Silhouette Score.
Finally, it presents various visual representations to
understand and analyze the segmented clusters before
concluding the segmentation pro- cess. The system
architecture Fig.2 starts with user interaction through the User
Interface, followed by data preprocessing for cleaning and
structuring data. The chosen distance metric in the Distance
Metric Selector influences the Linkage Matrix generation
utilized by the Hierarchical Clustering Engine to form
clusters. Evaluation, visualization, labeling, and the final
output, then summarize and present the clustered data for user
interpretation and further analysis.

Fig. 2. System Architecture

VI. RESULTS AND EVALUATION


In this section, we present the results and evaluations
derived from our hierarchical clustering approach applied to
customer segmentation. Our analysis progresses through
univariate, bivariate, and multivariate clustering, providing a
comprehensive understanding of customer behavior within
the dataset.
A. Data Loading and Exploration
The initial stage of our investigation involved loading the
customer data from the file ”Mall Customers.cs” into a Pandas
DataFrame. This facilitated a comprehensive exploration of
the dataset’s structure and attributes. The dataset encompasses
essential information such as age, annual income, spending
score, and gender. By loading the data into a DataFrame, we
laid the foundation for subsequent analyses aimed at unrav-
eling patterns and relationships among customer features. The
exploration primarily involved univariate and multivariate
clus- tering analyses, employing hierarchical clustering
methods.
B. Univariate Clustering - Annual Income
Univariate clustering focused on customer behavior
exclusively tied to annual income, utilizing the Agglomerative
Clustering algorithm with three clusters. This step aimed to
distill patterns in income distribution, offering a foundational
grasp of income-based customer segments. The primary
Fig. 1. Flowchart objective was to identify distinct groups with similar income
levels.

3
Authorized licensed use limited to: Zhejiang University. Downloaded on November 15,2024 at 15:06:04 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. . Univariate Clustering Dendogram

Fig. 4. Bivariate Clustering Dendogram


Fig. 5. Multivariate Clustering Dendogram
Visualizing Univariate Clustering:
D. Multivariate Clustering
A dendrogram is generated to visually represent the A more comprehensive approach is taken by considering
hierarchical relationships among individuals based on their multiple features ’Age,’ ’Annual Income (k),’ ’Spending
annual income. Utilizing the ’ward’ linkage method, this Score (1-100),’ and ’Genre Male.’ These features are pre-
visualization provides a clear depiction of the hierarchical processed, scaled, and used in the Agglomerative Clustering
clustering structure, aiding in the interpretation of income algorithm with five clusters, aiming to capture complex
segments (see Fig.3). relationships among multiple dimensions.
Visualizing Multivariate Clustering:
C. Bivariate Clustering - Income and Spending Score The hierarchical structure resulting from multivariate clus-
Building upon univariate clustering, bivariate clustering is tering is visually represented through another dendrogram as
performed by considering both ’Annual Income (k)’ and shown in Fig.5. This visualization aids in understanding the
complex relationships among the selected features, offering a
’Spending Score (1-100).’ The Agglomerative Clustering
holistic view of customer segmentation based on various
algorithm with five clusters is employed to identify more
dimensions (see Fig.5).
nuanced patterns by incorporating spending behavior along
with income. E. Analyzing and Visualizing Cluster from Results
Visualizing Bivariate Clustering: A count is conducted to determine the number of
individuals in each cluster resulting from the multivariate
Similar to the univariate clustering, a dendrogram is analysis. Subsequently, a scatter plot is generated,
created to visually interpret the hierarchical relationships plotting ’Age’ against
between income and spending score clusters. This
visualization provides insights into how customers cluster ’Annual Income (k),’ providing a two-dimensional
based on both income and spending behavior (see Fig.4). visualization of the clusters. This visual representation allows
for an intuitive interpretation of the segmentation results as in
Fig.6.

4
Authorized licensed use limited to: Zhejiang University. Downloaded on November 15,2024 at 15:06:04 UTC from IEEE Xplore. Restrictions apply.
F. Dataset Augmentation and Export represents the clustering patterns, offering insights into the
The cluster labels derived from the multivariate analysis relationships among customers in the context of the selected
features (see Fig.9).
were appended to the original dataset, resulting in a new
dataset named ’cluster.csv.’Additional details, such as the
maximum customer ID, were also extracted and
documented.

G. Silhouette Score for Multivariate Clustering Evaluation


To assess the efficacy of our multivariate clustering, we
em ployed the Silhouette Score, a metric gauging the
compactness and separation of clusters. The calculated
Silhouette Score for the multivariate clustering yielded a
value for multivariate clustering: 0.28699413201651747.

Fig. 8. Bivariate Clustering

Fig. 6. Clusters from Hierarchical Clustering

Fig. 9. 2D PCA

VII. CONCLUSION AND FUTURE WORK


In conclusion, the application of hierarchical clustering for
customer segmentation, as demonstrated in this study and
benchmarked against ’MALL CUSTOMER SEGMENTA-
TION USING MACHINE LEARNING,’ has provided value-
able insights into diverse customer behaviours. The
hierarchical approach, spanning univariate, bivariate, and
multivariate analyses, has proven effective in capturing
hierarchical structures and revealing relationships among
distinct customer segments. By dissecting customer attributes
such as annual income, spending score, age, and gender, the
hierarchical model, supported by visualizations like
dendrograms and scatter plots, has facilitated a nuanced
Fig. 7. Univariate Clustering understanding of the data. The incorporation of the Silhouette
Score, in conjunction with visualizations, has further enriched
H. Histogram and Scatterplot Visualisations our understanding of the identified clusters. These
To enhance the interpretability of our results, univariate segmentation outcomes directly inform the refinement of
and bivariate visualizations were generated. The histogram marketing strategies and the tailoring of services to meet the
illustrates the distribution of annual income across clusters specific needs of diverse customer groups, ultimately
derived from univariate clustering [see Figure 6]. enhancing satisfaction and loyalty.
Additionally, the scatter plot visualizes the bivariate clustering Looking ahead, potential areas for future exploration
of annual income and spending score, providing a clearer include optimizing feature selection and engineering,
representation of the segmented clusters (see Fig.7 and Fig.8). exploring alternative algorithms, and incorporating temporal
I. Principal Component Analysis (PCA) elements, ensuring continual responsiveness to the dynamic
landscape of customer preferences and behaviours in the
In order to visualize the multivariate clustering in a
evolving retail industry.
reduced two-dimensional space, Principal Component
Analysis (PCA) was applied. The 2D PCA plot visually

5
Authorized licensed use limited to: Zhejiang University. Downloaded on November 15,2024 at 15:06:04 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
[1] Chongkolnee Rungruang, Pakwan Riyapan, Arthit Intarasit, Khan- chit
Chuarkham, Jirapond Muangprathub,RFM model customer seg-
mentation based on hierarchical approach using CA,Expert Sys- tems
with Applications,Volume 237, Part B,2024,121449,ISSN 0957-
174,https://doi.org/10.1016/j.eswa.2023.121449.
[2] Sukanlaya Sawang, Chia-Chi Lee, Cindy Yunhsin Chou, Nanjangud
Vishwanath Vighnesh, Deepak Chandrashekar, Understanding post-
pandemic market segmentation through perceived risk, behavioural
intention, and emotional wellbeing of consumers, Journal of Re- tailing
and Consumer Services, Volume 75,2023,103482,ISSN 0969-
6989,https://doi.org/10.1016/j.jretconser.2023.103482.
[3] Hidenori Komatsu, Osamu Kimura,Customer segmentation
based on smart meter data analytics: Behavioral similarities
with manual categorization for building types,Energy and
Buildings, Volume 283,2023,112831,ISSN 0378-
7788,https://doi.org/10.1016/j.enbuild.2023.112831.
[4] D. Teslenko, A. Sorokina, K. Smelyakov and O. Filipov, ”Comparative
Analysis of the Applicability of Five Clustering Algorithms for Market
Segmentation,” 2023 IEEE Open Conference of Electrical, Electronic
and Information Sciences (eStream), Vilnius, Lithuania, 2023, pp. 1-6,
doi: 10.1109/eStream59056.2023.10134796.
[5] Luo, L., Li, B., Fan, X. et al. Dynamic customer segmentation via
hierar- chical fragmentation-coagulation processes. Mach Learn 112,
281–310 (2023). https://doi.org/10.1007/s10994-022-06276-8
[6] S. Jeena, A. Chaudhary and A. Thakur, ”Implementation & Anal- ysis
of Online Retail Dataset Using Clustering Algorithms,” 2023 4th
International Conference on Intelligent Engineering and Man- agement
(ICIEM), London, United Kingdom, 2023, pp. 16, doi:
10.1109/ICIEM59379.2023.10166552.
[7] Jie Yu and Xikun Zhang. 2023. Research on Online Learning User
Classification Based on Hierarchical Clustering. In Proceedings of the
2023 6th International Conference on Big Data and Education
(ICBDE ’23). Association for Computing Machinery, New York, NY,
USA, 100–105. https://doi.org/10.1145/3608218.3608222
[8] Phan Duy Hung, Nguyen Thi Thuy Lien, and Nguyen Duc Ngoc. 2019.
Customer Segmentation Using Hierarchical Agglomerative Clustering.
In Proceedings of the 2nd International Conference on Information Sci-
ence and Systems (ICISS ’19). Association for Computing Machinery,
New York, NY, USA, 33–37.
https://doi.org/10.1145/3322645.3322677

6
Authorized licensed use limited to: Zhejiang University. Downloaded on November 15,2024 at 15:06:04 UTC from IEEE Xplore. Restrictions apply.

You might also like