Heatmaps
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Isaiah Hull
Economist
MovieLens dataset
import pandas as pd
# Load ratings data.
ratings = pd.read_csv('datasets/movie_ratings.csv')
print(ratings.head())
userId movieId title
0 3149 54286 Bourne Ultimatum, The (2007)
1 3149 1220 Blues Brothers, The (1980)
2 3149 4007 Wall Street (1987)
3 3149 7156 Fog of War: Eleven...
4 3149 97304 Argo (2012)
MARKET BASKET ANALYSIS IN PYTHON
Creating "transactions" from ratings
# Recover unique user IDs.
user_id = movies['userId'].unique()
# Create library of highly rated movies for each user.
libraries = [list(ratings[ratings['userId'] == u].title) for u in user_id]
# Print example library.
print(library[0])
['Battlestar Galactica (2003)',
'Gorgon, The (1964)',
'Under the Skin (2013)',
'Upstream Color (2013)',
'Destry Rides Again (1939)',
'Dr. Phibes Rises Again (1972)']
MARKET BASKET ANALYSIS IN PYTHON
One-hot encoding transactions
from mlxtend.preprocessing import TransactionEncoder
# Instantiate transaction encoder.
encoder = TransactionEncoder()
# One-hot encode libraries.
onehot = encoder.fit(libraries).transform(libraries)
# Use movie titles as column headers.
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
# Print onehot header.
print(onehot.head())
MARKET BASKET ANALYSIS IN PYTHON
One-hot encoding transactions
(500) Days of Summer (2009) .45 (2006) 10 Things I Hate About You (1999)
0 False False False
1 False False False
2 False False False
3 False False False
4 False False False
MARKET BASKET ANALYSIS IN PYTHON
What is a heatmap?
MARKET BASKET ANALYSIS IN PYTHON
Preparing the data
1. Generate the rules.
Use Apriori algorithm and association rules.
2. Convert antecedents and consequents into strings.
Stored as frozen sets by default in mlxtend.
3. Convert rules into matrix format.
Suitable for use in heatmaps.
MARKET BASKET ANALYSIS IN PYTHON
Preparing the data
from mlxtend.frequent_patterns import association_rules, apriori
import seaborn as sns
# Apply the apriori algorithm
frequent_itemsets = apriori(onehot, min_support=0.10,
use_colnames=True, max_len=2)
# Recover the association rules
rules = association_rules(frequent_itemsets)
MARKET BASKET ANALYSIS IN PYTHON
Generating a heatmap
# Convert antecedents and consequents into strings
rules['antecedents'] = rules['antecedents'].apply(lambda a: ','.join(list(a)))
rules['consequents'] = rules['consequents'].apply(lambda a: ','.join(list(a)))
# Print example.
print(rules[['antecedents','consequents']])
antecedents consequents
0 Batman Begins (2005) Dark Knight Rises, The (2012)
MARKET BASKET ANALYSIS IN PYTHON
Generating a heatmap
# Transform antecedent, consequent, and support columns into matrix
support_table = rules.pivot(index='consequents', columns='antecedents',
values='support')
# Generate heatmap
sns.heatmap(support_table)
MARKET BASKET ANALYSIS IN PYTHON
Generating a heatmap
MARKET BASKET ANALYSIS IN PYTHON
Customizing heatmaps
sns.heatmap(pivot, annot=True, cbar=False, cmap='ocean')
MARKET BASKET ANALYSIS IN PYTHON
Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Scatterplots
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Isaiah Hull
Economist
Introduction to scatterplots
MARKET BASKET ANALYSIS IN PYTHON
Introduction to scatterplots
A scatterplot displays pairs of values.
Antecedent and consequent support.
Con dence and lift.
No model is assumed.
No trend line or curve needed.
Can provide starting point for pruning.
Identify patterns in data and rules.
MARKET BASKET ANALYSIS IN PYTHON
Support versus con dence
MARKET BASKET ANALYSIS IN PYTHON
Support versus con dence
1 Bayardo Jr., R.J. and Agrawal, R. (1999). Mining the Most Interesting Rules. In Proceedings of the Fifth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (pp. 145-154).
MARKET BASKET ANALYSIS IN PYTHON
Generating a scatterplot
import pandas as pd
import seaborn as sns
from mlxtend.frequent_patterns import association_rules, apriori
# Load one-hot encoded MovieLens data
onehot = pd.read_csv('datasets/movies_onehot.csv')
# Generate frequent itemsets using Apriori
frequent_itemsets = apriori(onehot, min_support=0.01, use_colnames=True, max_len=2)
# Generate association rules
rules = association_rules(frequent_itemsets, metric='support', min_threshold=0.0)
sns.scatterplot(x="antecedent support", y="consequent support", data=rules)
MARKET BASKET ANALYSIS IN PYTHON
Generating a scatterplot
MARKET BASKET ANALYSIS IN PYTHON
Adding a third metric
sns.scatterplot(x="antecedent support",
y="consequent support",
size="lift",
data=rules)
MARKET BASKET ANALYSIS IN PYTHON
Adding a third metric
MARKET BASKET ANALYSIS IN PYTHON
What can we learn from scatterplots?
Identify natural thresholds in data.
Not possible with heatmaps or other visualizations.
Visualize entire dataset.
Not limited to small number of rules.
Use ndings to prune.
Use natural thresholds and patterns to prune.
MARKET BASKET ANALYSIS IN PYTHON
Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Parallel coordinates
plot
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Isaiah Hull
Economist
What is a parallel coordinates plot?
MARKET BASKET ANALYSIS IN PYTHON
When to use parallel coordinate plots
Parallel coordinates vs. heatmap. Parallel coordinates vs. scatterplot.
Don't need intensity information. Want individual rule information.
Only want to know whether rule exists. Not interested in multiple metrics.
Want to reduce visual clutter. Only want to examine nal rules.
MARKET BASKET ANALYSIS IN PYTHON
Preparing the data
from mlxtend.frequent_patterns import association_rules, apriori
# Load the one-hot encoded data
onehot = pd.read_csv('datasets/movies_onehot.csv')
# Generate frequent itemsets
frequent_itemsets = apriori(onehot, min_support = 0.10, use_colnames = True, max_len = 2)
# Generate association rules
rules = association_rules(frequent_itemsets, metric = 'support', min_threshold = 0.00)
MARKET BASKET ANALYSIS IN PYTHON
Converting rules to coordinates
# Convert rules to coordinates.
rules['antecedent'] = rules['antecedents'].apply(lambda antecedent: list(antecedent)[0])
rules['consequent'] = rules['consequents'].apply(lambda consequent: list(consequent)[0])
rules['rule'] = rules.index
# Define coordinates and label
coords = rules[['antecedent','consequent','rule']]
# Print example
print(coords.head(1))
antecedent consequent rule
0 Dark Knight, The (2008) Inception (2010) 0
MARKET BASKET ANALYSIS IN PYTHON
Generating a parallel coordinates plot
from pandas.plotting import parallel_coordinates
# Generate parallel coordinates plot
parallel_coordinates(coords, 'rule', colormap = 'ocean')
MARKET BASKET ANALYSIS IN PYTHON
Generating a parallel coordinates plot
MARKET BASKET ANALYSIS IN PYTHON
Re ning a parallel coordinates plot
# Generate frequent itemsets
frequent_itemsets = apriori(onehot, min_support = 0.01, use_colnames = True, max_len = 2
# Generate association rules
rules = association_rules(frequent_itemsets, metric = 'lift', min_threshold = 1.00)
# Generate coordinates and print example
coords = rules_to_coordinates(rules)
# Generate parallel coordinates plot
parallel_coordinates(coords, 'rule')
MARKET BASKET ANALYSIS IN PYTHON
Re ning a parallel coordinates plot
MARKET BASKET ANALYSIS IN PYTHON
Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Congratulations!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Isaiah Hull
Instructor
Transactions and itemsets
Transactions Itemsets
{MILK, BREAD}
TID Transaction
{MILK, COFFEE, CEREAL}
1 MILK, BREAD, BISCUIT
... ...
20 TEA, MILK, COFFEE, CEREAL
MARKET BASKET ANALYSIS IN PYTHON
Association rules and metrics
Association Rules Metrics
Use if-then structure. Measure strength of association.
If A then B. Support, lift, con dence, conviction
Have antecedent(s) and consequent(s). Used to prune itemsets and rules.
Many association rules.
MARKET BASKET ANALYSIS IN PYTHON
Pruning and aggregation
MARKET BASKET ANALYSIS IN PYTHON
The Apriori algorithm
MARKET BASKET ANALYSIS IN PYTHON
Visualizing rules
MARKET BASKET ANALYSIS IN PYTHON
Congratulations!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N