0% found this document useful (0 votes)
2 views4 pages

Customer Segmentation Using Machine Learning

The document discusses the importance of customer segmentation in business using machine learning techniques, particularly the K-Means clustering algorithm. It highlights how effective segmentation can help companies identify target groups based on shared characteristics, ultimately leading to better marketing strategies and increased profits. The study provides a methodology for implementing K-Means clustering on a dataset from a mall store to categorize customers based on their income and spending behavior.

Uploaded by

jkaka0481
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Customer Segmentation Using Machine Learning

The document discusses the importance of customer segmentation in business using machine learning techniques, particularly the K-Means clustering algorithm. It highlights how effective segmentation can help companies identify target groups based on shared characteristics, ultimately leading to better marketing strategies and increased profits. The study provides a methodology for implementing K-Means clustering on a dataset from a mall store to categorize customers based on their income and spending behavior.

Uploaded by

jkaka0481
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CUSTOMER SEGMENTATION USING

MACHINE LEARNING
Anand Kasaudhan Dr S. Srinivasan
Rahul Kumar Gupta
Computer Science and Engineering Computer Science and Engineering
Computer Science and Engineering
Galgotias University Galgotias University
Galgotias University
Greater Noida,India Greater Noida,India
Greater Noida,India
akdev.0811@gmail.com s.srinivasan@galgotiasuniversity.edu.in
rahulkumargupta8821@gmail.com

Abstract— Effective decisions are mandatory for any community closer together. Almost all have moved to online
company to generate good revenue. In these days platforms, expanding their reach to large customer groups.
competition is huge and all companies are moving forward Customers are also happy to accept this change
with their own different strategies. We should use data and Each customer also generates a large amount of data. So
take a proper decision. Every person is different from one why should companies fall behind? Companies also need to
another and we do not know what he/she buys or what their change the way they work and use available resources to
likes are. But, with the help of machine learning technique support growth. Most business goals can be achieved
one can sort out the data and can find the target group by through customer segmentation. How do businesses benefit
applying several algorithms to the dataset. Without this, It from this?
will be very difficult and no better techniques are available For example, suppose a company starts using customer
to find the group of people with similar character and segmentation. A company wants to group its customers by
interests in a large dataset. Here, the customer region. Now the company is previewing which products will
segmentation using K-Means clustering helps to group the be rated the highest in which locations. The company can
data with same attributes which exactly helps to business now use this information to plan
the best. We are going to use elbow method to find the 's advertising campaigns, strategies and more. Indirectly, it
number of clusters and at last we visualize the data. brings more profit to the business.

Keywords—K-means algorithm, Machine Learning, un- 3. Literature Review


supervised learning, Customer segmentation, Clustering, 3.1 Customer Segmentation
Python
The business world has been highly competitive for many
years, and organizations need to increase their profits and
profits by meeting customer demands and acquiring new
1. INTRODUCTION
customers. I need to grow my business. needs. Identifying
The corporate sector has experienced significant growth in
customers and meeting their needs is a very complex and
recent years. Businesses set new goals every day and make
time-consuming task. This is because customers may differ
every effort to reach them. This has created a highly
based on their requirements, tastes, preferences, etc. Instead
competitive
of a "one-size-fits-all" approach, customer segmentation
environment in the corporate sector. Whether your company
groups customers into groups that share the same
is small or big, you are competing with other companies.
characteristics or behavioral traits. [5] Customer
The problem is that many of his
segmentation is therefore a strategy for dividing the market
competitors are not successful. There are many reasons why
into homogeneous groups.
businesses fail, but in our opinion, one of the biggest
reasons companies fail is those who choose to avoid
3.2 Clustering and K-Means Algorithm
learning from their customers. Any company has potential,
but they don't understand the market. In short, companies do
Clustering algorithms generates clusters such that within the
not divide the market. The solution to this problem is to
clusters are similar based on some characteristics. Similarity
understand customer segmentation (aka market
is defined in terms of how close the objects are in space.
segmentation). Customer segmentation can be described as a
K-means algorithm in one of the most popular centroid
game in which a child separates balls, cubes
based algorithm. Suppose data set, D, contains n objects in
according to their shape and color. Simply put, customer
space. Partitioning methods distribute the objects in D into k
segmentation means separating customers, marketing them
clusters, C1,...,Ck , that is, Ci ⊂ D and Ci ∩Cj = ∅ for (1 ≤ i,
by different criteria, and grouping them based on similar
j ≤ k). A centroid-based partitioning technique uses the
characteristics.
centroid of a cluster, Ci , to represent that cluster.
Why Use Customer Segmentation Now? Today's market is
Conceptually, the centroid of a cluster is its center point.
growing at a very fast rate, as are the customers. The
The difference between an object p ∈ Ci and ci , the
smartphone revolution has brought her
representative of the cluster, is measured by dist(p,ci),
where dist(x,y) is the Euclidean distance between two points
x and y.
Algorithm: A k-means algorithm for partitioning, where the
center of each cluster is represented by the mean value of
the objects in the cluster.
Input: k: number of clusters, D: dataset containing n objects.
Output: A set of k clusters.
Method: (1) arbitrarily select k objects from D as initial
cluster centers; (2) repeat (3) (re)assign each object to the
cluster to which the object is most similar, based on the
mean value of the objects in the cluster; (4) updating the
cluster means, i.e., calculating the mean value of the objects
for each cluster; (5) until changed.

4. Methodology
A mall store provided the dataset for clustering using the K-
means algorithm. Five attributes and 200 tuples form a
Figure.1. Annual Income vs Spending Score
dataset that represents the information of 200 consumers.
The characteristics in the data collection are CustomerId,
Now we can build a K-means model based on the fact that
gender, age, yearly income (k$), and spending
there are many groups, but not in great detail. The silhouette
score on a scale of (1-100).
coefficient approach is used to perform k-means clustering
for a range of k clusters (say 1 to 10) and estimate the sum
of the squared distances from each point to its assigned
center for each value. Decide on the number of clusters that
will give you the best silhouette score. This defines how the
silhouette score is calculated. We notice that once K=5 is
Table 1. Dataset reached, there is no rapid movement in WCSS (within
Cluster Sum of Squares). And given the number of clusters
To begin with, we need to clarify what data we will work we have now, K=5 will be the correct number of clusters. 7.
with (dataset see table 1). We use a straightforward yet Refer to the illustration.
comprehensive data set that includes customer ID, gender,
age, annual income and purchase score. The value of a
customer's purchases or spending at the mall is represented
by a spend score that ranges from 1 to 100. (The higher the
number, the greater the amount spent.) The structure of the
dataset was displayed correctly and there are no value
values.

If the dataset contains nulls, duplicates, or other noisy data,


data cleaning is required. Data cleaning ensures that the
information is reliable, usable and available for analysis.
When we have data, we can visualize it by comparing
gender-specific annual income and expenditure scores.
According to the study, there are five different types of
graphs that illustrate groups of customers who engage in the
following activities, as well as customer behaviors
associated with annual income and expenditure scores:
1. High Income / Low Expense Score
2. Low income - high score for spending
3. High score for spending – despite low income
4.Average income - average expenditure score
5.High income – high expenditure score.

Figure.2. Silhouette approach result.

We can divide the plot into various groups, determine


cluster can be prioritized, and then assign a label to each
using the method stated above. The K-means approach can
be used to decide which of the five clusters should be
targeted, namely clients with Moderate Income- Moderate
Spending Score, High Income- High Spending Score, and
Low Income- High Spending Score. The required behavior based on their annual income and expenditure
consumers have been located, as shown in Figure 3. scores. This cluster analysis can be applied to a number of
\ consumer marketing methods. We want to keep our target
clientele,
who have a high income and a high expense score because
they provide the largest profit margin. As their lifestyle
demands a high income and a low spending score,
customers will be attracted to the Mall supermarket because
of the wide variety of items available. Less Income Lesser
Spending Scores can get more promotions and will be
tempted to spend by receiving frequent offers and discounts.
Cluster analysis can be used to determine what things clients
wish to consume, allowing more targeted marketing efforts
to be developed. Potential clients in this situation are people
in groups 3 and 4.

6. CONCLUSION
This study demonstrates that client segmentation in
Figure.3. Final cluster of customers shopping malls is achievable despite the fact that this form
of machine learning application is highly useful in the
market, a manager can concentrate all of his or her attention
. on each cluster that has been discovered and meet all of
their requirements. Mall managers must be able to
5.EXPERIMENT RESULTS
understand what customers require and, more importantly,
Mall shoppers can be divided into five groups based on their
how to meet those needs. analyze their purchasing habits,
annual earnings and spending. For starters, the yellow group
and establish frequent encounters with customers that make
refers to people who have high incomes and high spending
them feel comfortable in order to satisfy their demands.
scores; this is an excellent example of a mall or retail center
being a good target. Because these are
most profitable customers. This person could be a frequent
REFERENCES
shopper at the mall where they could be easily apprehended
[1] “Customer segmentation based on survival character,”
by mall security. The blue group, on the other hand, consists
IEEE, Jul.2003.
of those who have a lot of money but spend very little. This
[2] “Customer Segmentation Using K Means Clustering,”
is an interesting case because there are many reasons for the
Towards Data Science, Apr. 2019.
development of such a club. Let's assume that they are
[3] Peter J. Rousseeuw (1987). "Silhouettes: a Graphical
people who like to shop, but are not satisfied with the
Aid to the Interpretation and Validation of Cluster
current offer or facilities of the mall. Those are good goals
Analysis". Computational and Applied Mathematics. 20:
too, but we'll have to find out why they're spending so little.
53–65. doi:10.1016/0377-0427(87)90125-7.
A department head or mall authority could design or build
[4] R.C. de Amorim, C. Hennig (2015). "Recovering the
facilities to attract these groups to come in and meet their
number of clusters in data sets with noise features using
needs. Based on the facts we know, they have average
feature rescaling factors". Information Sciences. 324: 126–
earnings and expenses, as illustrated by the orange group.
145. arXiv:1602.06989. doi:10.1016/j.ins.2015.06.039.
We can assume that these are people who do not always buy
[5] Leonard Kaufman; Peter J. Rousseeuw (1990).
things, but have a strong desire to spend despite their
Finding groups in data : An introduction to cluster
financial limits. As a manager, I try to avoid marketing
analysis. Hoboken, NJ: Wiley-Interscience. p. 87.
strategies that target this
doi:10.1002/9780470316801. ISBN9780471878766.
population as much as possible, because they do not
[6] Kriegel, Hans-Peter; Schubert, Erich; Zimek,
represent a significant source of income for the shopping
Arthur(2016). "The (black) art of runtime evaluation: Are
center. However, they can use a number of data analysis
we comparing algorithms or implementations?". Knowledge
techniques to help them increase their spending. There is a
and Information Systems. 52 (2): 341–378.
purple group that includes people with low income but high
doi:10.1007/s10115-016-1004-2. ISSN 0219-1377. S2CID
spending scores; despite their low income, people in this
40772241.
group like or are interested in spending money. This is also
[7] Fader, P. S., Hardie, B. G., & Lee, K. L. (2005).RFM
possible if customers are satisfied with the services of the
and CLV: Using iso-value curves for customer base
mall and therefore feel compelled to spend money because
analysis. Journal of Marketing Research, 42(4),415-430.
they are satisfied with the services. The green group, fifth,
[8] Tkachenko, Yegor. Autonomous CRM Control via CLV
had low annual incomes and bad spending habits. It also
Approximation with Deep Reinforcement Learning in
makes sense that they're on a tight budget and would cut
Discrete and Continuous Action Space.(April 8, 2015).
corners wherever possible, even if what they're doing is a
arXiv.org:https://arxiv.org/abs/1504.01840
smart and great decision given their circumstances. People
[9] Yeh, I-Cheng, Yang, King-Jang, and Ting, Tao-Ming,
in this cluster should be given the lowest priority by the mall
"Knowledge discovery on RFM model using Bernoulli
manager. By analyzing data, we can predict customer
sequence," Expert Systems with
Applications, 2009. Method Based on K-Means Algorithm. Physics
[10] Robert L. Thorndike (December 1953). "Who Belongs Procedia. 25. 1104-1109.
in the Family?". Psychometrika. 18 (4):267–276. 10.1016/j.phpro.2012.03.206.
doi:10.1007/BF02289263. [14] Wei, Jo-Ting & Lin, Shih-Yen & Wu, Hsin-
[11] Williamson, D & Parker, RA & Kendrick, Hung.(2010). A review of the application of RFM
Juliette.(1989). The box plot: A simple visual method to model.African Journal of Business Management December
interpret data. Annals of internal medicine. 110. 916-21. Special Review. 4. 4199-4206.
10.1059/0003-4819-110-11-916.
[12] Bhaya, Wesam. (2017). Review of Data
Preprocessing Techniques in Data Mining. Journal
of Engineering and Applied Sciences. 12. 4102-
4107. 10.3923/jeasci.2017.4102.4107.
[13] Li, Youguo & Wu, Haiyan. (2012). A Clustering

You might also like