Apriori Algorithm Examples
Apriori Algorithm Examples
It's designed to
discover frequent itemsets in a transactional database and derive association rules from those
itemsets.
Problem:
Imagine you're a supermarket manager. You have a massive database of customer transactions
(i.e., what items each customer bought). You want to find out:
The Challenge:
The number of possible itemsets grows exponentially with the number of items. Manually
checking all combinations is computationally infeasible for large datasets.
The Apriori algorithm solves this problem by using a "bottom-up" approach based on the
"Apriori property":
Apriori Property: If an itemset is frequent, then all of its subsets must also be frequent.
1. Generate Candidate Itemsets (C1): Create a list of all individual items (1-itemsets).
2. Calculate Support: Scan the database and count the occurrences of each itemset in C1.
Calculate the support for each itemset (support = number of transactions containing the
itemset / total number of transactions).
3. Prune Infrequent Itemsets (L1): Define a minimum support threshold. Remove
itemsets from C1 that have support below this threshold, creating L1 (frequent 1-
itemsets).
4. Generate Candidate Itemsets (Ck): Generate new candidate itemsets (Ck) of size k by
combining frequent itemsets from L(k-1). Only combine itemsets that share the first (k-2)
items.
5. Calculate Support: Scan the database and calculate the support for each itemset in Ck.
6. Prune Infrequent Itemsets (Lk): Remove itemsets from Ck that have support below the
minimum support threshold, creating Lk (frequent k-itemsets).
7. Repeat Steps 4-6: Continue generating and pruning candidate itemsets until no more
frequent itemsets can be found.
8. Generate Association Rules: From the frequent itemsets, generate association rules
based on confidence (confidence = support(A∪B) / support(A)). Define a minimum
confidence threshold to filter out weak rules.
Problem:
Imagine you're a supermarket manager. You have a massive database of customer transactions
(i.e., what items each customer bought). You want to find out:
The Challenge:
The number of possible itemsets grows exponentially with the number of items. Manually
checking all combinations is computationally infeasible for large datasets.
The Apriori algorithm solves this problem by using a "bottom-up" approach based on the
"Apriori property":
Apriori Property: If an itemset is frequent, then all of its subsets must also be frequent.
1. Generate Candidate Itemsets (C1): Create a list of all individual items (1-itemsets).
2. Calculate Support: Scan the database and count the occurrences of each itemset in C1.
Calculate the support for each itemset (support = number of transactions containing the
itemset / total number of transactions).
3. Prune Infrequent Itemsets (L1): Define a minimum support threshold. Remove
itemsets from C1 that have support below this threshold, creating L1 (frequent 1-
itemsets).
4. Generate Candidate Itemsets (Ck): Generate new candidate itemsets (Ck) of size k by
combining frequent itemsets from L(k-1). Only combine itemsets that share the first (k-2)
items.
5. Calculate Support: Scan the database and calculate the support for each itemset in Ck.
6. Prune Infrequent Itemsets (Lk): Remove itemsets from Ck that have support below the
minimum support threshold, creating Lk (frequent k-itemsets).
7. Repeat Steps 4-6: Continue generating and pruning candidate itemsets until no more
frequent itemsets can be found.
8. Generate Association Rules: From the frequent itemsets, generate association rules
based on confidence (confidence = support(A∪B) / support(A)). Define a minimum
confidence threshold to filter out weak rules.
Example:
Transactional Database:
Steps:
Scenario:
o An online bookstore wants to analyze customer purchase patterns to recommend
books and create bundled offers.
o Transaction data: Each transaction represents a customer's order, containing the
books they purchased.
Transactional Data:
o T1: {Novel, Mystery, Thriller}
o T2: {Novel, Science Fiction}
o T3: {Mystery, Cookbook}
o T4: {Novel, Mystery}
o T5: {Novel, Cookbook, Science Fiction}
Process:
o 1. Setting Parameters:
Minimum Support: 3 (a set must appear in at least 3 transactions)
We will focus on finding the frequent item sets.
o 2. Finding Frequent 1-Itemsets (L1):
Count occurrences:
Novel: 4
Mystery: 3
Thriller: 1
Science Fiction: 2
Cookbook: 2
L1: {Novel}, {Mystery} (Thriller, Science Fiction, and Cookbook are
below the support threshold)
o 3. Generating Candidate 2-Itemsets (C2):
C2: {Novel, Mystery}
o 4. Finding Frequent 2-Itemsets (L2):
Count occurrences:
{Novel, Mystery}: 3
L2: {Novel, Mystery}
o 5. Generating Candidate 3-Itemsets (C3):
Because L2 only contains one item set, C3 will be empty.
o 6. Results:
Frequent Itemsets: {Novel}, {Mystery}, {Novel, Mystery}
o 7. Association Rules (Example):
{Novel} -> {Mystery} : This indicates that customers who buy Novels
often also buy Mystery books.
Application:
o The bookstore can recommend Mystery books to customers who purchase
Novels.
o They could also create a bundled offer for Novels and Mystery books.
Scenario:
o A grocery store wants to optimize product placement.
o Transaction data: Records of customer purchases.
Transactional Data:
o T1: {Milk, Bread, Eggs}
o T2: {Milk, Yogurt}
o T3: {Bread, Diapers, Beer}
o T4: {Milk, Bread, Diapers, Yogurt}
o T5: {Yogurt, Eggs}
Process:
o 1. Setting Parameters:
Minimum Support: 2
o 2. Finding Frequent 1-Itemsets (L1):
Milk: 3, Bread: 2, Eggs: 2, Yogurt: 3, Diapers: 2, Beer: 1.
L1: {Milk}, {Bread}, {Eggs}, {Yogurt}, {Diapers}
o 3. Generating Candidate 2-Itemsets (C2):
C2: {Milk, Bread}, {Milk, Eggs}, {Milk, Yogurt}, {Milk, Diapers},
{Bread, Eggs}, {Bread, Yogurt}, {Bread, Diapers}, {Eggs, Yogurt},
{Eggs, Diapers}, {Yogurt, Diapers}
o 4. Finding Frequent 2-Itemsets (L2):
After counting support, L2: {Milk, Bread}, {Milk, Yogurt}, {Bread,
Diapers}, {Yogurt, Eggs}
o 5. Generating Candidate 3-Itemsets (C3):
C3: {Milk, Bread, Diapers}
o 6. finding Frequent 3-Itemsets (L3):
After counting support, L3: {Milk, Bread, Diapers}
o 7. Results:
Frequent Itemsets: {Milk}, {Bread}, {Eggs}, {Yogurt}, {Diapers}, {Milk,
Bread}, {Milk, Yogurt}, {Bread, Diapers}, {Yogurt, Eggs}, {Milk, Bread,
Diapers}
o 8. Association rules (Example):
{Milk, Bread} -> {Diapers}
Application:
o The store can place Milk, Bread, and Diapers close together.
o They can also place Yogurt and Eggs next to each other.
o This information can be used for sales promotions.
The Apriori algorithm efficiently narrows down the search for frequent itemsets.
The support threshold is crucial in determining which itemsets are considered frequent.
Association rules provide valuable insights into relationships between items.
Scenario:
o An e-commerce website selling various electronics wants to analyze customer
purchase patterns to optimize product recommendations and promotional offers.
o The dataset includes transactions with items like laptops, headphones, keyboards,
mice, and external hard drives.
Transactional Data:
o T1: {Laptop, Headphones, Keyboard}
o T2: {Laptop, Mouse}
o T3: {Headphones, External Hard Drive}
o T4: {Laptop, Headphones}
o T5: {Laptop, Keyboard, Mouse}
o T6: {Headphones, Keyboard}
o T7: {Laptop, Headphones, External Hard Drive}
o T8: {Keyboard, Mouse}
o T9: {Laptop, Mouse}
o T10: {Headphones, External Hard Drive, Keyboard}
Parameters:
o Minimum Support: 3
o Minimum Confidence: 60%
Process:
o 1. Finding Frequent 1-Itemsets (L1):
Count occurrences:
Laptop: 5
Headphones: 5
Keyboard: 4
Mouse: 3
External Hard Drive: 3
L1: {Laptop}, {Headphones}, {Keyboard}, {Mouse}, {External Hard
Drive}
o 2. Generating Candidate 2-Itemsets (C2):
C2: {Laptop, Headphones}, {Laptop, Keyboard}, {Laptop, Mouse},
{Laptop, External Hard Drive}, {Headphones, Keyboard}, {Headphones,
Mouse}, {Headphones, External Hard Drive}, {Keyboard, Mouse},
{Keyboard, External Hard Drive}, {Mouse, External Hard Drive}
o 3. Finding Frequent 2-Itemsets (L2):
After counting support:
{Laptop, Headphones}: 3
{Laptop, Mouse}: 3
{Headphones, Keyboard}: 3
{Headphones, External Hard Drive}: 3
{Keyboard, Mouse}: 3
L2: {Laptop, Headphones}, {Laptop, Mouse}, {Headphones, Keyboard},
{Headphones, External Hard Drive}, {Keyboard, Mouse}
o 4. Generating Candidate 3-Itemsets (C3):
C3: {Laptop, Headphones, Keyboard}, {Laptop, Headphones, Mouse},
{Headphones, Keyboard, External Hard Drive}
o 5. Finding Frequent 3-Itemsets (L3):
After counting support:
{Headphones, Keyboard, External Hard Drive}: 2
{Laptop, Headphones, Keyboard}: 1
{Laptop, Headphones, Mouse}: 1
L3: {} Because none of the 3 item sets have a support of 3 or more.
o 6. Generating Association Rules:
From L2:
{Laptop} -> {Headphones}: Support = 3/10, Confidence = 3/5 =
60%
{Laptop} -> {Mouse}: Support = 3/10, Confidence = 3/5 = 60%
{Headphones} -> {Keyboard}: Support = 3/10, Confidence = 3/5
= 60%
{Headphones} -> {External Hard Drive}: Support = 3/10,
Confidence = 3/5 = 60%
{Keyboard} -> {Mouse}: Support = 3/10, Confidence = 3/4 = 75%
{Headphones} -> {Laptop}: Support = 3/10, Confidence = 3/5 =
60%
{Mouse} -> {Laptop}: Support = 3/10, Confidence = 3/3 = 100%
{Keyboard} -> {Headphones}: Support = 3/10, Confidence = 3/4
= 75%
{External Hard Drive} -> {Headphones}: Support = 3/10,
Confidence = 3/3 = 100%
{Mouse} -> {Keyboard}: Support = 3/10, Confidence = 3/3 =
100%
Results and Applications:
o The e-commerce website can use these rules to:
Recommend headphones to customers who purchase laptops.
Suggest mice to customers who buy keyboards.
Create bundled offers for laptops and mice.
Promote external hard drives to people who buy headphones.
Place keyboard and mouse together on the webpage.
Problem Statement
In large transactional databases, identifying frequent itemsets (groups of items that appear
together frequently) is a critical challenge for applications like market basket analysis.
Traditional methods of finding frequent itemsets can be computationally expensive due to the
exponential number of possible item combinations.
The Apriori Algorithm efficiently finds frequent itemsets using the Apriori property, which
states that:
This property helps in reducing the search space by eliminating infrequent itemsets early.
2 {Milk, Bread}
3 {Milk, Butter}
Transaction ID Items Purchased
4 {Bread, Butter}
Let's set the minimum support count = 2 (an itemset must appear in at least 2 transactions to be
considered frequent).
{Milk} → 4
{Bread} → 4
{Butter} → 3
{Milk, Bread} → 3
{Milk, Butter} → 3
{Bread, Butter} → 3
Frequent.
4. No More Frequent Itemsets (as adding another itemset violates Apriori property).
These rules help businesses understand purchasing patterns and suggest relevant items to
customers.
Conclusion
The Apriori algorithm is an efficient way to extract frequent itemsets and generate association
rules, helping businesses make data-driven decisions in areas like recommendation systems,
inventory management, and cross-selling.
A supermarket wants to analyze customer purchase patterns to offer better discounts and
improve recommendations.
Dataset (Transactions)
Transaction ID Items Purchased
3 {Apple, Milk}
Apple 4
Banana 4
Milk 4
Bread 3
Step 2: Frequent 2-itemsets
Itemset Count
{Apple, Banana} 3
{Apple, Milk} 3
{Banana, Milk} 3
{Banana, Bread} 3
{Bread, Milk} 2
Business Insight
Customers who buy Apple and Banana are likely to buy Milk.
A discount on Banana & Bread could increase Milk sales.
Dataset (Transactions)
Transaction ID Products Purchased
2 {Laptop, Mouse}
3 {Laptop, Keyboard}
Laptop 4
Mouse 4
Keyboard 4
Headphones 2
{Laptop, Mouse} 3
{Laptop, Keyboard} 3
{Mouse, Keyboard} 3
{Mouse, Headphones} 2
Business Insight
A hospital wants to analyze common symptoms that appear together to improve disease
diagnosis.
2 {Fever, Cough}
3 {Cough, Fatigue}
4 {Fever, Fatigue}
Fever 4
Cough 4
Fatigue 3
Step 2: Frequent 2-itemsets
Symptom Pair Count
{Fever, Cough} 3
{Fever, Fatigue} 3
{Cough, Fatigue} 3
Medical Insight
Conclusion
A supermarket wants to analyze customer purchase patterns to offer better discounts and
improve recommendations.
3 {Apple, Milk}
Item Count
Apple 4
Banana 4
Milk 4
Bread 3
Itemset Count
{Apple, Banana} 3
Itemset Count
{Apple, Milk} 3
{Banana, Milk} 3
{Banana, Bread} 3
{Bread, Milk} 2
Itemset Count
2 {Laptop, Mouse}
3 {Laptop, Keyboard}
Laptop 4
Mouse 4
Item Count
Keyboard 4
Headphones 2
{Laptop, Mouse} 3
{Laptop, Keyboard} 3
{Mouse, Keyboard} 3
{Mouse, Headphones} 2
This is frequent.
3. {Laptop,keyboard} → {Mouse}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅
Business Insights
Customers buying a Laptop and Mouse often buy a Keyboard.
Customers buying Keyboards & Mice frequently buy Laptops.
The store can bundle Laptop, Mouse & Keyboard as a package deal.
A hospital wants to analyze common symptoms that appear together to improve disease
diagnosis.
2 {Fever, Cough}
3 {Cough, Fatigue}
4 {Fever, Fatigue}
Fever 4
Cough 4
Fatigue 3
{Fever, Cough} 3
{Fever, Fatigue} 3
{Cough, Fatigue} 3
This is frequent.
Medical Insights
Patients with Fever and Cough are likely to develop Fatigue.
Doctors should monitor Cough & Fatigue for potential Fever onset.
Mining association rules is a key technique in data mining, used to discover relationships
between items in large transactional databases.
A single-dimensional Boolean association rule means:
Single-dimensional: The rule involves only one attribute (e.g., "Items Purchased").
Boolean: The presence (1) or absence (0) of an item is considered (no quantity or other
attributes).
2. Problem Definition
Given a transactional database, we aim to find frequent itemsets and derive association rules
based on the Apriori algorithm.
Goal
A frequent itemset is an itemset that appears in the transactions at least min_support times.
Association rules are derived from frequent itemsets and must meet min_confidence.
Item Frequency
Milk 4
Bread 4
Item Frequency
Butter 3
Itemset Frequency
{Milk, Bread} 3
{Milk, Butter} 3
{Bread, Butter} 3
Itemset Frequency
{Milk, Bread, Butter} 2
1. {Milk} → {Bread}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
2. {Bread} → {Milk}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
3. {Milk} → {Butter}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
Customers buying Milk and Bread are likely to buy Butter. → Recommend Butter to
such customers.
Discount on Bread might increase Milk sales.
Bundle Milk, Bread & Butter together as a combo deal.
7. Conclusion
Single-dimensional Boolean association rules help find item relationships in
transactions.
The Apriori Algorithm efficiently finds frequent itemsets.
Support & Confidence metrics help derive strong association rules.
2. Problem Definition
Example: Retail Transaction Database
Goal
Assume min_support = 2.
Itemset Frequency
Age: 25-34 3
Male 3
Female 2
NY 2
Chicago 2
Milk 4
Bread 4
Butter 3
Itemset Frequency
{Age: 25-34, Milk} 3
{Male, Bread} 3
{Milk, Bread} 3
{NY, Milk} 2
{Chicago, Milk} 2
Itemset Frequency
{Age: 25-34, Milk, Bread} 2
{Male, Milk, Bread} 2
6. Conclusion
Multi-dimensional association rule mining helps uncover hidden relationships between
multiple attributes.
Relational databases are transformed into Boolean transaction format for mining.
The Apriori Algorithm is applied to find frequent itemsets & rules.
Business insights allow for better marketing strategies.
Here are some additional real-world examples where multi-dimensional association rule mining
is applied:
Frequent Itemsets
{Smoking: Yes, Exercise: No} → {Heart Disease}
{Age: 40-50, Smoking: Yes} → {Heart Disease}
{Age: 50-60, No Exercise} → {Diabetes}
Insights
Patients who smoke and do not exercise are highly likely to develop heart disease. →
Preventive health programs should target this group.
Elderly patients (50-60) with no exercise habits have a higher risk of diabetes. →
Health awareness campaigns should focus on promoting exercise.
Frequent Itemsets
{Age: 20-30, Online Purchase: Yes, Large Amount: Yes} → {Fraudulent}
{Region: USA, Online Purchase: Yes} → {Fraudulent}
Insights
Young customers (20-30) making large online purchases are at high risk of fraud. →
Banks should flag such transactions for verification.
Online transactions from the USA show a higher fraud rate. → Implement additional
security for online transactions in this region.
Frequent Itemsets
{Age: 20-30, Browsed: Electronics} → {Purchased: Accessories}
{Age: 30-40, Browsed: Clothing} → {Purchased: Shoes}
Insights
Young males (20-30) browsing electronics often buy accessories. → Show targeted
ads for accessories when they browse electronics.
Women aged 30-40 browsing clothing often buy shoes. → Suggest shoe
recommendations during clothing purchases.
Frequent Itemsets
{Season: Winter, Customer: Tourist} → {Jacket}
{Season: Summer, Customer: Local} → {Cold Drinks}
Insights
Tourists in winter mostly buy jackets. → Offer discount bundles for jackets to tourists.
Locals in summer buy cold drinks. → Increase cold drink promotions in local stores
during summer.
Frequent Itemsets
{Age: 20-30, Gender: Male, Interest: Sports} → {Clicked Ad}
{Interest: Fitness} → {Clicked Ad}
Insights
Young males (20-30) interested in sports are more likely to click on ads. → Target
this group for sports product ads.
Users interested in fitness click on ads frequently. → Increase fitness-related ad
campaigns.
Conclusion
Goal
Itemset Frequency
Age: 20-30 3
Male 3
Browsed: Electronics 3
Purchased: Accessories 2
Itemset Frequency
{Age: 20-30, Browsed: Electronics} 3
{Age: 20-30, Male} 3
{Male, Browsed: Electronics} 3
{Browsed: Electronics, Purchased: Accessories} 2
Itemset Frequency
{Age: 20-30, Browsed: Electronics, Purchased: Accessories} 2
{Male, Browsed: Electronics, Purchased: Accessories} 2
Transactional Databases:
o These databases store records of transactions, where each transaction consists of a
set of items.
o For example, a supermarket's transactional database would record each customer's
purchase as a separate transaction, with the items bought listed within that
transaction.
Boolean Association Rules:
o These rules deal with the presence or absence of items, rather than their
quantities.
o They express relationships in the form of "if A, then B," where A and B are sets
of items.
o "Boolean" signifies that we're concerned with whether an item is present (true) or
absent (false) in a transaction.
Single-Dimensional:
o This means that the association rules involve only one attribute or dimension. In
the context of market basket analysis, this typically refers to the items purchased.
o Contrast this with multi-dimensional association rules, which might involve
attributes like customer age, income, or location.
Association Rule Mining:
o The goal is to discover interesting relationships or patterns between items in a
transactional database.
o These relationships are expressed as association rules.
Key Metrics:
Support:
o The proportion of transactions that contain both itemset A and itemset B.
transactions)
Confidence:
o The proportion of transactions that contain itemset A that also contain itemset B.
transactions containing A)
Example:
Let's consider a simple transactional database of a small grocery store:
Transactions:
o T1: {Bread, Milk}
o T2: {Bread, Diapers, Beer, Eggs}
o T3: {Milk, Diapers, Beer, Cola}
o T4: {Bread, Milk, Diapers, Beer}
o T5: {Bread, Milk, Diapers, Cola}
1. Frequent Itemsets:
o We need to find itemsets that appear frequently in the transactions.
o Let's set a minimum support threshold of 3.
o Frequent itemsets:
{Bread}: 4
{Milk}: 4
{Diapers}: 4
{Beer}: 3
{Bread, Milk}: 3
{Bread, Diapers}: 3
{Milk, Diapers}: 3
{Milk, Bread, Diapers}: 3
2. Generating Association Rules:
o From the frequent itemsets, we can generate association rules.
o Let's set a minimum confidence threshold of 60%.
o Example rules:
{Bread} → {Milk}:
Support = 3/5 = 60%
Confidence = 3/4 = 75%
{Diapers} → {Beer}:
Support = 3/5 = 60%
Confidence = 3/4 = 75%
{Milk} -> {Diapers}:
Support = 3/5 = 60%
Confidence = 3/4 = 75%
{Bread, Milk} -> {Diapers}:
Support = 3/5 = 60%
Confidence = 3/3 = 100%
Interpretation:
The rule "{Bread} → {Milk}" with 75% confidence means that 75% of the customers
who bought bread also bought milk.
The rule "{Diapers} -> {Beer}" with 75% confidence means that 75% of the customers
that bought diapers also bought beer.
The rule "{Bread, Milk} -> {Diapers}" with 100% confidence means that every customer
that bought both Bread, and Milk, also bought diapers.
Applications:
By mining single-dimensional Boolean association rules, businesses can gain valuable insights
into customer behavior and optimize their strategies.
Discretization:
o Converting continuous quantitative attributes into discrete intervals.
o This can be done statically (before mining) or dynamically (during mining).
o Example: "age" can be discretized into "age = 18-25," "age = 26-35," etc.
Concept Hierarchies:
o Organizing attributes into hierarchical levels of abstraction.
o Example: "location" can have levels like "city," "state," and "country."
o This allows us to discover rules at different levels of detail.
Mining Techniques:
o Adapting traditional association rule algorithms (like Apriori) to handle multiple
dimensions.
o Developing new techniques that can efficiently handle the increased complexity.
Example:
Process:
1. Data Preprocessing:
o Clean and transform the data.
o Discretize quantitative attributes.
o Define concept hierarchies.
2. Frequent Itemset Generation:
o Find combinations of attribute-value pairs that meet a minimum support
threshold.
o This involves considering combinations across multiple dimensions.
3. Rule Generation:
o Generate association rules from the frequent itemsets.
o Calculate confidence and other metrics.
4. Rule Evaluation:
o Filter out uninteresting or redundant rules.
o Use domain knowledge to evaluate the significance of the rules.
Applications:
Targeted Marketing:
o Identifying customer segments that are likely to respond to specific promotions.
Sales Forecasting:
o Predicting product demand based on multiple factors.
Fraud Detection:
o Identifying unusual patterns that may indicate fraudulent activity.
Business Intelligence:
o Gaining deeper insights into business trends and customer behavior.
Let's create a more concrete example to illustrate multi-dimensional association rule mining.
We have a data warehouse for an electronics retailer with the following dimensions:
Customer:
o Age (quantitative, discretized into ranges)
o Location (categorical: City)
Product:
o Category (categorical)
o Price (quantitative, discretized into ranges)
Time:
o Season (categorical)
Mining Process:
1. Data Preprocessing:
o Age is discretized into ranges (20-30, 30-40, 40-50).
o Price is discretized into ranges (Low, Medium, High).
o Other attributes are already categorical.
2. Frequent Itemset Generation:
o Let's set a minimum support threshold of 3 (meaning an itemset must appear in at
least 3 transactions).
o We'll look for combinations of attribute-value pairs.
o Some frequent itemsets:
{Customer Age = 20-30}: 5
{Customer Location = London}: 5
{Product Category = Laptop}: 4
{Product Price = High}: 3
{Season = Winter}: 4
{Customer Age = 20-30, Customer Location = London}: 4
{Customer Age = 20-30, Product Category = Laptop}: 3
{Customer Age = 20-30, Product Price = High}: 3
{Customer Age = 20-30, Season = Winter}: 4
{Customer Location = London, Product Category = Laptop}: 3
{Customer Location = London, Season = Winter}: 4
{Product Category = Laptop, Product Price = High}: 3
{Product Category = Laptop, Season = Winter}: 3
3. Rule Generation:
o Let's set a minimum confidence threshold of 60%.
o Example rules:
{Customer Age = 20-30} -> {Customer Location = London} (support =
4/10, confidence = 4/5 = 80%)
{Customer Age = 20-30} -> {Product Category = Laptop} (support =
3/10, confidence = 3/5 = 60%)
{Product Category = Laptop} -> {Product Price = High} (support = 3/10,
confidence = 3/4 = 75%)
{Customer Age = 20-30, Customer Location = London} -> {Product
Category = Laptop} (support = 3/10, confidence = 3/4 = 75%)
{Customer Age = 20-30, Customer Location = London} -> {Season =
Winter} (support = 4/10, confidence = 4/4 = 100%)
4. Rule Interpretation:
o "80% of customers aged 20-30 are located in London."
o "60% of customers aged 20-30 purchase Laptops."
o "75% of Laptop purchases are high priced."
o "75% of customers aged 20-30 and living in London buy laptops."
o "All of the customers aged 20-30 that live in London, purchase items during the
winter season."
Applications:
Targeted Promotions:
o The retailer can target 20-30-year-olds in London with laptop promotions during
the winter.
Product Placement:
o High-priced laptops can be prominently displayed in stores or online for 20-30-
year-old customers.
Inventory Management:
o Ensure sufficient stock of laptops in London during the winter season.
This example shows how multi-dimensional association rule mining can uncover valuable
insights that go beyond simple item-to-item relationships.
Key Concepts
Types of Constraints
1. Knowledge-Based Constraints: Based on domain knowledge (e.g., only mining rules related to
electronics).
2. Data-Based Constraints: Based on data properties (e.g., ignore infrequent items).
3. Interestingness Constraints: Based on statistical measures like support, confidence, and lift.
4. Monotonic Constraints: Adding more items always satisfies the constraint.
5. Anti-Monotonic Constraints: Adding more items always violates the constraint.
Benefits
Constraint-based association rule mining enhances the traditional association rule mining process
by applying various constraints that refine the results. Let’s go through different types of
constraints with detailed examples.
1. Item Constraints
Item constraints restrict the rules to contain specific items or exclude certain items.
Example:
✅ Use Case: If a retailer wants to analyze only dairy product purchases, they can apply an item
constraint to filter out non-dairy products.
2. Length Constraints
These constraints control the number of items in association rules.
Example:
✅ Use Case: A business may want to focus only on simple purchase relationships (2-item rules)
instead of complex multi-item rules.
3. Support Constraints
Support measures how frequently an itemset appears in the dataset. A minimum support
constraint ensures that only frequent itemsets are considered.
Example:
✅ Use Case: In retail, support constraints help filter out rarely occurring itemsets that are not
useful for marketing strategies.
Example:
✅ Use Case: In targeted promotions, a store might focus only on rules with high lift values,
ensuring strong purchase relationships.
5. Monotonic Constraints
A monotonic constraint means that if an itemset satisfies the constraint, then any of its supersets
will also satisfy it.
Example:
If {Shirt} is frequent, then {Shirt, Jeans} and {Shirt, Shoes, Jeans} will also be
frequent.
✅ Use Case: Helps in efficient mining by expanding frequent itemsets only when a smaller
subset is already frequent.
6. Anti-Monotonic Constraints
An anti-monotonic constraint means that if an itemset violates the constraint, then all its
supersets will also violate it.
Example:
Suppose a store wants to analyze rules where the total price of items is below $50.
{Milk ($10), Bread ($15), Butter ($20)} → Total = $45 ✅ (Valid)
{Milk ($10), Bread ($15), Butter ($20), Cheese ($30)} → Total = $75 ❌ (Invalid)
Since {Milk, Bread, Butter, Cheese} exceeds $50, any superset containing these items will
also be invalid.
Real-World Applications
1. Market Basket Analysis (Retail)