DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Introduction to RFM
segmentation
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
What is RFM segmentation?
Behavioral customer segmentation based on three metrics:
Recency (R)
Frequency (F)
Monetary Value (M)
DataCamp Customer Segmentation in Python
Grouping RFM values
The RFM values can be grouped in several ways:
Percentiles e.g. quantiles
Pareto 80/20 cut
Custom - based on business knowledge
We are going to implement percentile-based grouping.
DataCamp Customer Segmentation in Python
Short review of percentiles
Process of calculating percentiles:
1. Sort customers based on that metric
2. Break customers into a pre-defined number of groups of equal size
3. Assign a label to each group
DataCamp Customer Segmentation in Python
Calculate percentiles with Python
Data with eight CustomerID and a randomly calculated Spend values.
DataCamp Customer Segmentation in Python
Calculate percentiles with Python
spend_quartiles = pd.qcut(data['Spend'], q=4, labels=range(1,5))
data['Spend_Quartile'] = spend_quartiles
data.sort_values('Spend')
DataCamp Customer Segmentation in Python
Assigning labels
Highest score to the best metric - best is not always highest e.g. recency
In this case, the label is inverse - the more recent the customer, the better
DataCamp Customer Segmentation in Python
Assigning labels
# Create numbered labels
r_labels = list(range(4, 0, -1))
# Divide into groups based on quartiles
recency_quartiles = pd.qcut(data['Recency_Days'], q=4, labels=r_labels)
# Create new column
data['Recency_Quartile'] = recency_quartiles
# Sort recency values from lowest to highest
data.sort_values('Recency_Days')
DataCamp Customer Segmentation in Python
Assigning labels
As you can see, the quartile labels are reversed, since the more recent customers
are more valuable.
DataCamp Customer Segmentation in Python
Custom labels
We can define a list with string or any other values, depending on the use case.
# Create string labels
r_labels = ['Active', 'Lapsed', 'Inactive', 'Churned']
# Divide into groups based on quartiles
recency_quartiles = pd.qcut(data['Recency_Days'], q=4, labels=r_labels)
# Create new column
data['Recency_Quartile'] = recency_quartiles
# Sort values from lowest to highest
data.sort_values('Recency_Days')
DataCamp Customer Segmentation in Python
Custom labels
Custom labels assigned to each quartile
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Let's practice with
percentiles!
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Recency, Frequency,
Monetary Value calculation
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Definitions
Recency - days since last customer transaction
Frequency - number of transactions in the last 12 months
Monetary Value - total spend in the last 12 months
DataCamp Customer Segmentation in Python
Dataset and preparations
Same online dataset like in the previous lessons
Need to do some data preparation
New TotalSum column = Quantity x UnitPrice.
DataCamp Customer Segmentation in Python
Data preparation steps
We're starting with a pre-processed online DataFrame with only the latest 12
months of data:
print('Min:{}; Max:{}'.format(min(online.InvoiceDate),
max(online.InvoiceDate)))
Min:2010-12-10; Max:2011-12-09
Let's create a hypothetical snapshot_day data as if we're doing analysis recently.
snapshot_date = max(online.InvoiceDate) + datetime.timedelta(days=1)
DataCamp Customer Segmentation in Python
Calculate RFM metrics
# Aggregate data on a customer level
datamart = online.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
# Rename columns for easier interpretation
datamart.rename(columns = {'InvoiceDate': 'Recency',
'InvoiceNo': 'Frequency',
'TotalSum': 'MonetaryValue'}, inplace=True)
# Check the first rows
datamart.head()
DataCamp Customer Segmentation in Python
Final RFM values
Our table for RFM segmentation is completed!
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Let's practice calculating
RFM values!
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Building RFM segments
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Data
Dataset we created previously
Will calculate quartile value for each column and name then R, F, M
DataCamp Customer Segmentation in Python
Recency quartile
r_labels = range(4, 0, -1)
r_quartiles = pd.qcut(datamart['Recency'], 4, labels = r_labels)
datamart = datamart.assign(R = r_quartiles.values)
DataCamp Customer Segmentation in Python
Frequency and Monetary quartiles
f_labels = range(1,5)
m_labels = range(1,5)
f_quartiles = pd.qcut(datamart['Frequency'], 4, labels = f_labels)
m_quartiles = pd.qcut(datamart['MonetaryValue'], 4, labels = m_labels)
datamart = datamart.assign(F = f_quartiles.values)
datamart = datamart.assign(M = m_quartiles.values)
DataCamp Customer Segmentation in Python
Build RFM Segment and RFM Score
Concatenate RFM quartile values to RFM_Segment
Sum RFM quartiles values to RFM_Score
def join_rfm(x): return str(x['R']) + str(x['F']) + str(x['M'])
datamart['RFM_Segment'] = datamart.apply(join_rfm, axis=1)
datamart['RFM_Score'] = datamart[['R','F','M']].sum(axis=1)
DataCamp Customer Segmentation in Python
Final result
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Let's practice building RFM
segments
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Analyzing RFM segments
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Largest RFM segments
datamart.groupby('RFM_Segment').size().sort_values(ascending=False)[:10]
DataCamp Customer Segmentation in Python
Filtering on RFM segments
Select bottom RFM segment "111" and view top 5 rows
datamart[datamart['RFM_Segment']=='111'][:5]
DataCamp Customer Segmentation in Python
Summary metrics per RFM Score
datamart.groupby('RFM_Score').agg({
'Recency': 'mean',
'Frequency': 'mean',
'MonetaryValue': ['mean', 'count'] }).round(1)
DataCamp Customer Segmentation in Python
Grouping into named segments
Use RFM score to group customers into Gold, Silver and Bronze segments.
def segment_me(df):
if df['RFM_Score'] >= 9:
return 'Gold'
elif (df['RFM_Score'] >= 5) and (df['RFM_Score'] < 9):
return 'Silver'
else:
return 'Bronze'
datamart['General_Segment'] = datamart.apply(segment_me, axis=1)
datamart.groupby('General_Segment').agg({
'Recency': 'mean',
'Frequency': 'mean',
'MonetaryValue': ['mean', 'count']
}).round(1)
DataCamp Customer Segmentation in Python
New segments and their values
DataCamp Customer Segmentation in Python
CUSTOMER SEGMENTATION IN PYTHON
Practice building custom
segments