0% found this document useful (0 votes)
8 views45 pages

Apriori Algorithm Examples

The Apriori algorithm is used in association rule mining to identify frequent itemsets and derive association rules from customer transaction data. It employs a bottom-up approach, generating candidate itemsets and pruning infrequent ones based on support and confidence thresholds. This technique aids in product placement, targeted marketing, and inventory management by revealing item relationships and purchasing patterns.

Uploaded by

tmukthar518
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

Apriori Algorithm Examples

The Apriori algorithm is used in association rule mining to identify frequent itemsets and derive association rules from customer transaction data. It employs a bottom-up approach, generating candidate itemsets and pruning infrequent ones based on support and confidence thresholds. This technique aids in product placement, targeted marketing, and inventory management by revealing item relationships and purchasing patterns.

Uploaded by

tmukthar518
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

The Apriori algorithm is a classic algorithm used in association rule mining.

It's designed to
discover frequent itemsets in a transactional database and derive association rules from those
itemsets.

Problem:

Imagine you're a supermarket manager. You have a massive database of customer transactions
(i.e., what items each customer bought). You want to find out:

1. Frequently bought itemsets: Which combinations of items are often purchased


together? (e.g., bread and milk, beer and diapers).
2. Association rules: What are the relationships between these items? (e.g., if a customer
buys bread, they are likely to also buy milk).

This information can be used for:

 Product placement: Placing frequently bought items together to increase sales.


 Targeted marketing: Offering promotions on items often purchased together.
 Inventory management: Ensuring popular combinations are always in stock.

The Challenge:

The number of possible itemsets grows exponentially with the number of items. Manually
checking all combinations is computationally infeasible for large datasets.

Solution: The Apriori Algorithm

The Apriori algorithm solves this problem by using a "bottom-up" approach based on the
"Apriori property":

 Apriori Property: If an itemset is frequent, then all of its subsets must also be frequent.

Steps of the Algorithm:

1. Generate Candidate Itemsets (C1): Create a list of all individual items (1-itemsets).
2. Calculate Support: Scan the database and count the occurrences of each itemset in C1.
Calculate the support for each itemset (support = number of transactions containing the
itemset / total number of transactions).
3. Prune Infrequent Itemsets (L1): Define a minimum support threshold. Remove
itemsets from C1 that have support below this threshold, creating L1 (frequent 1-
itemsets).
4. Generate Candidate Itemsets (Ck): Generate new candidate itemsets (Ck) of size k by
combining frequent itemsets from L(k-1). Only combine itemsets that share the first (k-2)
items.
5. Calculate Support: Scan the database and calculate the support for each itemset in Ck.
6. Prune Infrequent Itemsets (Lk): Remove itemsets from Ck that have support below the
minimum support threshold, creating Lk (frequent k-itemsets).
7. Repeat Steps 4-6: Continue generating and pruning candidate itemsets until no more
frequent itemsets can be found.
8. Generate Association Rules: From the frequent itemsets, generate association rules
based on confidence (confidence = support(A∪B) / support(A)). Define a minimum
confidence threshold to filter out weak rules.

Apriori Algorithm: Problem and Solution with Examples


The Apriori algorithm is a classic algorithm used in association rule mining. It's designed to
discover frequent itemsets in a transactional database and derive association rules from those
itemsets.

Problem:

Imagine you're a supermarket manager. You have a massive database of customer transactions
(i.e., what items each customer bought). You want to find out:

1. Frequently bought itemsets: Which combinations of items are often purchased


together? (e.g., bread and milk, beer and diapers).
2. Association rules: What are the relationships between these items? (e.g., if a customer
buys bread, they are likely to also buy milk).

This information can be used for:

 Product placement: Placing frequently bought items together to increase sales.


 Targeted marketing: Offering promotions on items often purchased together.
 Inventory management: Ensuring popular combinations are always in stock.

The Challenge:

The number of possible itemsets grows exponentially with the number of items. Manually
checking all combinations is computationally infeasible for large datasets.

Solution: The Apriori Algorithm

The Apriori algorithm solves this problem by using a "bottom-up" approach based on the
"Apriori property":

 Apriori Property: If an itemset is frequent, then all of its subsets must also be frequent.

Steps of the Algorithm:

1. Generate Candidate Itemsets (C1): Create a list of all individual items (1-itemsets).
2. Calculate Support: Scan the database and count the occurrences of each itemset in C1.
Calculate the support for each itemset (support = number of transactions containing the
itemset / total number of transactions).
3. Prune Infrequent Itemsets (L1): Define a minimum support threshold. Remove
itemsets from C1 that have support below this threshold, creating L1 (frequent 1-
itemsets).
4. Generate Candidate Itemsets (Ck): Generate new candidate itemsets (Ck) of size k by
combining frequent itemsets from L(k-1). Only combine itemsets that share the first (k-2)
items.
5. Calculate Support: Scan the database and calculate the support for each itemset in Ck.
6. Prune Infrequent Itemsets (Lk): Remove itemsets from Ck that have support below the
minimum support threshold, creating Lk (frequent k-itemsets).
7. Repeat Steps 4-6: Continue generating and pruning candidate itemsets until no more
frequent itemsets can be found.
8. Generate Association Rules: From the frequent itemsets, generate association rules
based on confidence (confidence = support(A∪B) / support(A)). Define a minimum
confidence threshold to filter out weak rules.

Example:

Transactional Database:

Transaction ID Items Bought


T1 A, B, C
T2 A, C
T3 A, D
T4 B, E, F
T5 B, C, F

Minimum Support Threshold: 2 (meaning an itemset must appear in at least 2 transactions)

Minimum Confidence Threshold: 60%

Steps:

1. C1: { {A}, {B}, {C}, {D}, {E}, {F} }


2. Support(C1):
o {A}: 3
o {B}: 3
o {C}: 3
o {D}: 1
o {E}: 1
o {F}: 2
3. L1: { {A}, {B}, {C}, {F} } (D and E are removed due to low support)
4. C2: { {A,B}, {A,C}, {A,F}, {B,C}, {B,F}, {C,F} }
5. Support(C2):
o {A,B}: 1
o {A,C}: 2
o {A,F}: 0
o {B,C}: 2
o {B,F}: 2
o {C,F}: 1
6. L2: { {A,C}, {B,C}, {B,F} }
7. C3: { {B,C,F} }
8. Support(C3):
o {B,C,F}: 1
9. L3: {} because the support of {B,C,F} is 1, which is less than the minimum support of 2.
10. Frequent Itemsets: { {A}, {B}, {C}, {F}, {A,C}, {B,C}, {B,F} }
11. Association Rules (Example):
o {A} -> {C}: Support(A,C) = 2, Support(A) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {B} -> {C}: Support(B,C) = 2, Support(B) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {B} -> {F}: Support(B,F) = 2, Support(B) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {C} -> {A}: Support(A,C) = 2, Support(C) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {C} -> {B}: Support(B,C) = 2, Support(C) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {F} -> {B}: Support(B,F) = 2, Support(F) = 2, Confidence = 2/2 = 100% (meets
minimum confidence)

Example 1: Online Book Store

 Scenario:
o An online bookstore wants to analyze customer purchase patterns to recommend
books and create bundled offers.
o Transaction data: Each transaction represents a customer's order, containing the
books they purchased.
 Transactional Data:
o T1: {Novel, Mystery, Thriller}
o T2: {Novel, Science Fiction}
o T3: {Mystery, Cookbook}
o T4: {Novel, Mystery}
o T5: {Novel, Cookbook, Science Fiction}
 Process:
o 1. Setting Parameters:
 Minimum Support: 3 (a set must appear in at least 3 transactions)
 We will focus on finding the frequent item sets.
o 2. Finding Frequent 1-Itemsets (L1):
 Count occurrences:
 Novel: 4
 Mystery: 3
 Thriller: 1
 Science Fiction: 2
 Cookbook: 2
 L1: {Novel}, {Mystery} (Thriller, Science Fiction, and Cookbook are
below the support threshold)
o 3. Generating Candidate 2-Itemsets (C2):
 C2: {Novel, Mystery}
o 4. Finding Frequent 2-Itemsets (L2):
 Count occurrences:
 {Novel, Mystery}: 3
 L2: {Novel, Mystery}
o 5. Generating Candidate 3-Itemsets (C3):
 Because L2 only contains one item set, C3 will be empty.
o 6. Results:
 Frequent Itemsets: {Novel}, {Mystery}, {Novel, Mystery}
o 7. Association Rules (Example):
 {Novel} -> {Mystery} : This indicates that customers who buy Novels
often also buy Mystery books.
 Application:
o The bookstore can recommend Mystery books to customers who purchase
Novels.
o They could also create a bundled offer for Novels and Mystery books.

Example 2: Grocery Store Analysis

 Scenario:
o A grocery store wants to optimize product placement.
o Transaction data: Records of customer purchases.
 Transactional Data:
o T1: {Milk, Bread, Eggs}
o T2: {Milk, Yogurt}
o T3: {Bread, Diapers, Beer}
o T4: {Milk, Bread, Diapers, Yogurt}
o T5: {Yogurt, Eggs}
 Process:
o 1. Setting Parameters:
 Minimum Support: 2
o 2. Finding Frequent 1-Itemsets (L1):
 Milk: 3, Bread: 2, Eggs: 2, Yogurt: 3, Diapers: 2, Beer: 1.
 L1: {Milk}, {Bread}, {Eggs}, {Yogurt}, {Diapers}
o 3. Generating Candidate 2-Itemsets (C2):
 C2: {Milk, Bread}, {Milk, Eggs}, {Milk, Yogurt}, {Milk, Diapers},
{Bread, Eggs}, {Bread, Yogurt}, {Bread, Diapers}, {Eggs, Yogurt},
{Eggs, Diapers}, {Yogurt, Diapers}
o 4. Finding Frequent 2-Itemsets (L2):
 After counting support, L2: {Milk, Bread}, {Milk, Yogurt}, {Bread,
Diapers}, {Yogurt, Eggs}
o 5. Generating Candidate 3-Itemsets (C3):
 C3: {Milk, Bread, Diapers}
o 6. finding Frequent 3-Itemsets (L3):
 After counting support, L3: {Milk, Bread, Diapers}
o 7. Results:
 Frequent Itemsets: {Milk}, {Bread}, {Eggs}, {Yogurt}, {Diapers}, {Milk,
Bread}, {Milk, Yogurt}, {Bread, Diapers}, {Yogurt, Eggs}, {Milk, Bread,
Diapers}
o 8. Association rules (Example):
 {Milk, Bread} -> {Diapers}
 Application:
o The store can place Milk, Bread, and Diapers close together.
o They can also place Yogurt and Eggs next to each other.
o This information can be used for sales promotions.

Key takeaways from these examples:

 The Apriori algorithm efficiently narrows down the search for frequent itemsets.
 The support threshold is crucial in determining which itemsets are considered frequent.
 Association rules provide valuable insights into relationships between items.

Example: E-commerce Website Purchase Analysis

 Scenario:
o An e-commerce website selling various electronics wants to analyze customer
purchase patterns to optimize product recommendations and promotional offers.
o The dataset includes transactions with items like laptops, headphones, keyboards,
mice, and external hard drives.
 Transactional Data:
o T1: {Laptop, Headphones, Keyboard}
o T2: {Laptop, Mouse}
o T3: {Headphones, External Hard Drive}
o T4: {Laptop, Headphones}
o T5: {Laptop, Keyboard, Mouse}
o T6: {Headphones, Keyboard}
o T7: {Laptop, Headphones, External Hard Drive}
o T8: {Keyboard, Mouse}
o T9: {Laptop, Mouse}
o T10: {Headphones, External Hard Drive, Keyboard}
 Parameters:
o Minimum Support: 3
o Minimum Confidence: 60%
 Process:
o 1. Finding Frequent 1-Itemsets (L1):
 Count occurrences:
 Laptop: 5
 Headphones: 5
 Keyboard: 4
 Mouse: 3
 External Hard Drive: 3
 L1: {Laptop}, {Headphones}, {Keyboard}, {Mouse}, {External Hard
Drive}
o 2. Generating Candidate 2-Itemsets (C2):
 C2: {Laptop, Headphones}, {Laptop, Keyboard}, {Laptop, Mouse},
{Laptop, External Hard Drive}, {Headphones, Keyboard}, {Headphones,
Mouse}, {Headphones, External Hard Drive}, {Keyboard, Mouse},
{Keyboard, External Hard Drive}, {Mouse, External Hard Drive}
o 3. Finding Frequent 2-Itemsets (L2):
 After counting support:
 {Laptop, Headphones}: 3
 {Laptop, Mouse}: 3
 {Headphones, Keyboard}: 3
 {Headphones, External Hard Drive}: 3
 {Keyboard, Mouse}: 3
 L2: {Laptop, Headphones}, {Laptop, Mouse}, {Headphones, Keyboard},
{Headphones, External Hard Drive}, {Keyboard, Mouse}
o 4. Generating Candidate 3-Itemsets (C3):
 C3: {Laptop, Headphones, Keyboard}, {Laptop, Headphones, Mouse},
{Headphones, Keyboard, External Hard Drive}
o 5. Finding Frequent 3-Itemsets (L3):
 After counting support:
 {Headphones, Keyboard, External Hard Drive}: 2
 {Laptop, Headphones, Keyboard}: 1
 {Laptop, Headphones, Mouse}: 1
 L3: {} Because none of the 3 item sets have a support of 3 or more.
o 6. Generating Association Rules:
 From L2:
 {Laptop} -> {Headphones}: Support = 3/10, Confidence = 3/5 =
60%
 {Laptop} -> {Mouse}: Support = 3/10, Confidence = 3/5 = 60%
 {Headphones} -> {Keyboard}: Support = 3/10, Confidence = 3/5
= 60%
 {Headphones} -> {External Hard Drive}: Support = 3/10,
Confidence = 3/5 = 60%
 {Keyboard} -> {Mouse}: Support = 3/10, Confidence = 3/4 = 75%
 {Headphones} -> {Laptop}: Support = 3/10, Confidence = 3/5 =
60%
 {Mouse} -> {Laptop}: Support = 3/10, Confidence = 3/3 = 100%
{Keyboard} -> {Headphones}: Support = 3/10, Confidence = 3/4

= 75%
 {External Hard Drive} -> {Headphones}: Support = 3/10,
Confidence = 3/3 = 100%
 {Mouse} -> {Keyboard}: Support = 3/10, Confidence = 3/3 =
100%
 Results and Applications:
o The e-commerce website can use these rules to:
 Recommend headphones to customers who purchase laptops.
 Suggest mice to customers who buy keyboards.
 Create bundled offers for laptops and mice.
 Promote external hard drives to people who buy headphones.
 Place keyboard and mouse together on the webpage.

Problem Statement

In large transactional databases, identifying frequent itemsets (groups of items that appear
together frequently) is a critical challenge for applications like market basket analysis.
Traditional methods of finding frequent itemsets can be computationally expensive due to the
exponential number of possible item combinations.

Solution: Apriori Algorithm

The Apriori Algorithm efficiently finds frequent itemsets using the Apriori property, which
states that:

"A subset of a frequent itemset must also be frequent."

This property helps in reducing the search space by eliminating infrequent itemsets early.

Example of Apriori Algorithm

Step 1: Given Transaction Dataset

Consider the following transactions in a grocery store:

Transaction ID Items Purchased

1 {Milk, Bread, Butter}

2 {Milk, Bread}

3 {Milk, Butter}
Transaction ID Items Purchased

4 {Bread, Butter}

5 {Milk, Bread, Butter}

Step 2: Set Minimum Support

Let's set the minimum support count = 2 (an itemset must appear in at least 2 transactions to be
considered frequent).

Step 3: Find Frequent Itemsets

1. Find Frequent 1-itemsets (Count occurrences):

{Milk} → 4
{Bread} → 4
{Butter} → 3

All are frequent (≥2).

2. Generate Candidate 2-itemsets & Count Support:

{Milk, Bread} → 3
{Milk, Butter} → 3
{Bread, Butter} → 3

All are frequent.

3. Generate Candidate 3-itemset & Count Support:

{Milk, Bread, Butter} → 2

Frequent.

4. No More Frequent Itemsets (as adding another itemset violates Apriori property).

Step 4: Generate Association Rules

Using a minimum confidence threshold, we generate rules like:

 {Milk, Bread} → {Butter} (Confidence = 2/3)


 {Milk, Butter} → {Bread} (Confidence = 2/3)
 {Bread, Butter} → {Milk} (Confidence = 2/3)

These rules help businesses understand purchasing patterns and suggest relevant items to
customers.
Conclusion

The Apriori algorithm is an efficient way to extract frequent itemsets and generate association
rules, helping businesses make data-driven decisions in areas like recommendation systems,
inventory management, and cross-selling.

Example 1: Market Basket Analysis


Problem

A supermarket wants to analyze customer purchase patterns to offer better discounts and
improve recommendations.

Dataset (Transactions)
Transaction ID Items Purchased

1 {Apple, Banana, Milk}

2 {Apple, Banana, Bread}

3 {Apple, Milk}

4 {Banana, Bread, Milk}

5 {Apple, Banana, Bread, Milk}

Applying Apriori Algorithm

Minimum Support Threshold = 2

Step 1: Frequent 1-itemsets


Item Count

Apple 4

Banana 4

Milk 4

Bread 3
Step 2: Frequent 2-itemsets
Itemset Count

{Apple, Banana} 3

{Apple, Milk} 3

{Banana, Milk} 3

{Banana, Bread} 3

{Bread, Milk} 2

Step 3: Frequent 3-itemsets


Itemset Count

{Apple, Banana, Milk} 2

{Banana, Bread, Milk} 2

Step 4: Association Rules

1. {Apple, Banana} → {Milk} (Confidence = 2/3)


2. {Banana, Bread} → {Milk} (Confidence = 2/3)
3. {Banana, Milk} → {Apple} (Confidence = 2/3)

Business Insight

 Customers who buy Apple and Banana are likely to buy Milk.
 A discount on Banana & Bread could increase Milk sales.

Example 2: Online Retail Analysis


Problem

An e-commerce website wants to suggest related products based on previous customer


purchases.

Dataset (Transactions)
Transaction ID Products Purchased

1 {Laptop, Mouse, Keyboard}


Transaction ID Products Purchased

2 {Laptop, Mouse}

3 {Laptop, Keyboard}

4 {Mouse, Keyboard, Headphones}

5 {Laptop, Mouse, Keyboard, Headphones}

Applying Apriori Algorithm

Minimum Support Threshold = 2

Step 1: Frequent 1-itemsets


Item Count

Laptop 4

Mouse 4

Keyboard 4

Headphones 2

Step 2: Frequent 2-itemsets


Itemset Count

{Laptop, Mouse} 3

{Laptop, Keyboard} 3

{Mouse, Keyboard} 3

{Mouse, Headphones} 2

Step 3: Frequent 3-itemsets


Itemset Count

{Laptop, Mouse, Keyboard} 2

Step 4: Association Rules

1. {Laptop, Mouse} → {Keyboard} (Confidence = 2/3)


2. {Mouse, Keyboard} → {Laptop} (Confidence = 2/3)
3. {Laptop} → {Mouse} (Confidence = 3/4)

Business Insight

 Customers buying a Laptop and Mouse often buy a Keyboard.


 Customers buying Keyboards & Mice frequently buy Laptops.
 The store can bundle Laptop, Mouse & Keyboard as a package deal.

Example 3: Medical Diagnosis


Problem

A hospital wants to analyze common symptoms that appear together to improve disease
diagnosis.

Dataset (Patients and Symptoms)


Patient ID Symptoms

1 {Fever, Cough, Fatigue}

2 {Fever, Cough}

3 {Cough, Fatigue}

4 {Fever, Fatigue}

5 {Fever, Cough, Fatigue}

Applying Apriori Algorithm

Minimum Support Threshold = 2

Step 1: Frequent 1-itemsets


Symptom Count

Fever 4

Cough 4

Fatigue 3
Step 2: Frequent 2-itemsets
Symptom Pair Count

{Fever, Cough} 3

{Fever, Fatigue} 3

{Cough, Fatigue} 3

Step 3: Frequent 3-itemsets


Symptom Group Count

{Fever, Cough, Fatigue} 2

Step 4: Association Rules

1. {Fever, Cough} → {Fatigue} (Confidence = 2/3)


2. {Fever, Fatigue} → {Cough} (Confidence = 2/3)
3. {Cough, Fatigue} → {Fever} (Confidence = 2/3)

Medical Insight

 Patients with Fever and Cough are likely to develop Fatigue.


 Doctors should monitor Cough & Fatigue for potential Fever onset.

Conclusion

The Apriori Algorithm helps uncover patterns in various domains:

1. Retail – Understanding product co-occurrence for discounts & recommendations.


2. E-commerce – Suggesting frequently bought products together.
3. Healthcare – Identifying symptom relationships for better diagnosis.

Example 1: Market Basket Analysis


Problem Statement

A supermarket wants to analyze customer purchase patterns to offer better discounts and
improve recommendations.

Step 1: Dataset Representation


Transactions:

Transaction ID Items Purchased

1 {Apple, Banana, Milk}

2 {Apple, Banana, Bread}

3 {Apple, Milk}

4 {Banana, Bread, Milk}

5 {Apple, Banana, Bread, Milk}

We assume a minimum support count of 2.

Step 2: Generate Frequent 1-itemsets


Count each item’s occurrence:

Item Count

Apple 4

Banana 4

Milk 4

Bread 3

All items are frequent (≥2).

Step 3: Generate Frequent 2-itemsets


Find combinations of two items and count occurrences.

Itemset Count

{Apple, Banana} 3
Itemset Count

{Apple, Milk} 3

{Banana, Milk} 3

{Banana, Bread} 3

{Bread, Milk} 2

All 2-itemsets are frequent (≥2).

Step 4: Generate Frequent 3-itemsets


Find combinations of three items from the 2-itemsets.

Itemset Count

{Apple, Banana, Milk} 2

{Banana, Bread, Milk} 2

Both are frequent.

Step 5: Generate Association Rules


Now, we generate rules using minimum confidence = 50%.

1. {Apple, Banana} → {Milk}


o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

2. {Banana, Bread} → {Milk}


o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

3. {Banana, Milk} → {Apple}


o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅
Business Insights
 Customers buying Apple and Banana are likely to buy Milk.
 A discount on Banana & Bread could increase Milk sales.
 Suggest Apple to customers buying Banana & Milk.

Example 2: Online Retail Analysis


Problem Statement

An e-commerce website wants to suggest related products based on previous customer


purchases.

Step 1: Dataset Representation


Transaction ID Products Purchased

1 {Laptop, Mouse, Keyboard}

2 {Laptop, Mouse}

3 {Laptop, Keyboard}

4 {Mouse, Keyboard, Headphones}

5 {Laptop, Mouse, Keyboard, Headphones}

Minimum support count = 2.

Step 2: Generate Frequent 1-itemsets


Item Count

Laptop 4

Mouse 4
Item Count

Keyboard 4

Headphones 2

All are frequent.

Step 3: Generate Frequent 2-itemsets


Itemset Count

{Laptop, Mouse} 3

{Laptop, Keyboard} 3

{Mouse, Keyboard} 3

{Mouse, Headphones} 2

All are frequent.

Step 4: Generate Frequent 3-itemsets


Itemset Count

{Laptop, Mouse, Keyboard} 2

This is frequent.

Step 5: Generate Association Rules


1. {Laptop, Mouse} → {Keyboard}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅
2. {Mouse, Keyboard} → {Laptop}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

3. {Laptop,keyboard} → {Mouse}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅

Business Insights
 Customers buying a Laptop and Mouse often buy a Keyboard.
 Customers buying Keyboards & Mice frequently buy Laptops.
 The store can bundle Laptop, Mouse & Keyboard as a package deal.

Example 3: Medical Diagnosis


Problem Statement

A hospital wants to analyze common symptoms that appear together to improve disease
diagnosis.

Step 1: Dataset Representation


Patient ID Symptoms

1 {Fever, Cough, Fatigue}

2 {Fever, Cough}

3 {Cough, Fatigue}

4 {Fever, Fatigue}

5 {Fever, Cough, Fatigue}

Minimum support count = 2.


Step 2: Generate Frequent 1-itemsets
Symptom Count

Fever 4

Cough 4

Fatigue 3

All are frequent.

Step 3: Generate Frequent 2-itemsets


Symptom Pair Count

{Fever, Cough} 3

{Fever, Fatigue} 3

{Cough, Fatigue} 3

All are frequent.

Step 4: Generate Frequent 3-itemsets


Symptom Group Count

{Fever, Cough, Fatigue} 2

This is frequent.

Step 5: Generate Association Rules


1. {Fever, Cough} → {Fatigue}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

2. {Fever, Fatigue} → {Cough}


o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

3. {Cough, Fatigue} → {Fever}


o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

Medical Insights
 Patients with Fever and Cough are likely to develop Fatigue.
 Doctors should monitor Cough & Fatigue for potential Fever onset.

Mining Single-Dimensional Boolean Association Rules from


Transactional Databases
1. Introduction

Mining association rules is a key technique in data mining, used to discover relationships
between items in large transactional databases.
A single-dimensional Boolean association rule means:

 Single-dimensional: The rule involves only one attribute (e.g., "Items Purchased").
 Boolean: The presence (1) or absence (0) of an item is considered (no quantity or other
attributes).

For example, in a supermarket:

{Milk, Bread} → {Butter}


(If a customer buys Milk and Bread, they are likely to buy Butter)

2. Problem Definition
Given a transactional database, we aim to find frequent itemsets and derive association rules
based on the Apriori algorithm.

Transactional Database Example


Transaction ID Items Purchased
1 {Milk, Bread, Butter}
2 {Milk, Bread}
3 {Milk, Butter}
4 {Bread, Butter}
5 {Milk, Bread, Butter}

Goal

1. Find Frequent Itemsets


o Identify sets of items that frequently occur together.
2. Generate Association Rules
o Find relationships between itemsets using support and confidence.

3. Steps in Mining Boolean Association Rules


We follow two major steps:

Step 1: Finding Frequent Itemsets (Using Apriori Algorithm)

A frequent itemset is an itemset that appears in the transactions at least min_support times.

Step 2: Generating Association Rules

Association rules are derived from frequent itemsets and must meet min_confidence.

4. Step 1: Finding Frequent Itemsets


(a) Set Minimum Support

Let’s assume min_support = 2 (an itemset must appear in at least 2 transactions).

(b) Finding Frequent 1-itemsets

Count the frequency of individual items:

Item Frequency
Milk 4
Bread 4
Item Frequency
Butter 3

All items occur at least 2 times, so they are frequent.

(c) Finding Frequent 2-itemsets

Find all possible pairs and count their occurrence:

Itemset Frequency
{Milk, Bread} 3
{Milk, Butter} 3
{Bread, Butter} 3

All itemsets appear ≥ 2 times, so they are frequent.

(d) Finding Frequent 3-itemsets

Find combinations of three items:

Itemset Frequency
{Milk, Bread, Butter} 2

This itemset is frequent (≥2).

5. Step 2: Generating Association Rules


We use minimum confidence (min_confidence = 60%) to filter the best rules.

Generating Rules from Frequent Itemsets

(a) From 2-itemsets

1. {Milk} → {Bread}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
2. {Bread} → {Milk}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
3. {Milk} → {Butter}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)

(b) From 3-itemsets

1. {Milk, Bread} → {Butter}


o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)
2. {Milk, Butter} → {Bread}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)

6. Interpretation and Business Insights


Insights for Supermarket

 Customers buying Milk and Bread are likely to buy Butter. → Recommend Butter to
such customers.
 Discount on Bread might increase Milk sales.
 Bundle Milk, Bread & Butter together as a combo deal.

7. Conclusion
 Single-dimensional Boolean association rules help find item relationships in
transactions.
 The Apriori Algorithm efficiently finds frequent itemsets.
 Support & Confidence metrics help derive strong association rules.

Mining Multi-Dimensional Association Rules


from Relational Databases & Data
Warehouses
1. Introduction
Multi-dimensional association rule mining extends traditional (single-dimensional) association
rule mining by incorporating multiple attributes from relational databases or data warehouses.

 Single-dimensional: Uses only one attribute (e.g., "Items Purchased") → {Milk,


Bread} → {Butter}
 Multi-dimensional: Uses multiple attributes (e.g., "Age", "Region", "Category") →
{Age: 20-30, Gender: Male, Item: Laptop} → {Item: Mouse}

Applications of Multi-Dimensional Association Rules

 Retail: Analyze how customer demographics influence purchases.


 Healthcare: Find disease patterns based on age, gender, and lifestyle.
 Finance: Identify fraud patterns based on location, transaction type, and amount.

2. Problem Definition
Example: Retail Transaction Database

A supermarket wants to analyze the influence of customer attributes on purchases.


We use a relational database with the following schema:

Transaction ID Customer Age Gender Location Purchased Items


1 25-34 Male New York {Milk, Bread}
2 35-44 Female Chicago {Milk, Butter}
3 25-34 Male New York {Milk, Bread, Butter}
4 45-54 Male Los Angeles {Bread, Butter}
5 25-34 Female Chicago {Milk, Bread}

Goal

1. Find frequent itemsets considering customer attributes.


2. Generate association rules based on demographic influence.

3. Types of Multi-Dimensional Association Rules


1. Inter-dimensional Association Rules
o Involve multiple attributes → {Age: 25-34, Location: New York} → {Milk,
Bread}
2. Hybrid-Dimensional Association Rules
o Involve both categorical attributes & transactional items → {Gender: Male,
Milk} → {Bread}
3. Quantitative Association Rules
o Use numeric ranges → {Age: 25-34} → {Milk, Bread}

4. Steps in Mining Multi-Dimensional Association Rules


Step 1: Transform Relational Data into Transaction Format

We represent data in a structured Boolean matrix:

Transaction Age: 25- Age: 35- Age: 45-


Male Female NY Chicago LA Milk Bread Butter
ID 34 44 54
1 1 0 0 1 0 1 0 0 1 1 0
2 0 1 0 0 1 0 1 0 1 0 1
3 1 0 0 1 0 1 0 0 1 1 1
4 0 0 1 1 0 0 0 1 0 1 1
5 1 0 0 0 1 0 1 0 1 1 0

Each 1 represents the presence of an attribute/item in a transaction.

Step 2: Apply Apriori Algorithm to Find Frequent Itemsets

(a) Set Minimum Support

Assume min_support = 2.

(b) Frequent 1-itemsets

Itemset Frequency
Age: 25-34 3
Male 3
Female 2
NY 2
Chicago 2
Milk 4
Bread 4
Butter 3

All are frequent (≥2).


(c) Frequent 2-itemsets

Itemset Frequency
{Age: 25-34, Milk} 3
{Male, Bread} 3
{Milk, Bread} 3
{NY, Milk} 2
{Chicago, Milk} 2

All are frequent.

(d) Frequent 3-itemsets

Itemset Frequency
{Age: 25-34, Milk, Bread} 2
{Male, Milk, Bread} 2

All are frequent.

Step 3: Generate Association Rules

We use min_confidence = 60%.

Rules from 2-itemsets

1. {Age: 25-34} → {Milk}


o Confidence = 3/3 = 100% ✅
2. {Male} → {Bread}
o Confidence = 3/3 = 100% ✅
3. {Milk} → {Bread}
o Confidence = 3/4 = 75% ✅

Rules from 3-itemsets

1. {Age: 25-34, Milk} → {Bread}


o Confidence = 2/3 = 66.7% ✅
2. {Male, Milk} → {Bread}
o Confidence = 2/3 = 66.7% ✅
5. Interpretation & Business Insights
Insights for Supermarket

1. Younger customers (25-34) are highly likely to buy Milk.


→ Advertise Milk deals to this age group.
2. Males are more likely to buy Bread.
→ Offer combo discounts on Bread for male shoppers.
3. If a customer buys Milk, they are likely to buy Bread.
→ Bundle Milk & Bread together.

6. Conclusion
 Multi-dimensional association rule mining helps uncover hidden relationships between
multiple attributes.
 Relational databases are transformed into Boolean transaction format for mining.
 The Apriori Algorithm is applied to find frequent itemsets & rules.
 Business insights allow for better marketing strategies.

More Examples of Multi-Dimensional Association Rule Mining

Here are some additional real-world examples where multi-dimensional association rule mining
is applied:

1. Healthcare - Disease Prediction


Dataset Example

Patient ID Age Group Gender Smoking Exercise Disease


1 40-50 Male Yes No Heart Disease
2 30-40 Female No Yes No Disease
3 40-50 Male Yes No Heart Disease
4 50-60 Female No No Diabetes
5 30-40 Male Yes Yes No Disease

Frequent Itemsets
 {Smoking: Yes, Exercise: No} → {Heart Disease}
 {Age: 40-50, Smoking: Yes} → {Heart Disease}
 {Age: 50-60, No Exercise} → {Diabetes}

Insights

 Patients who smoke and do not exercise are highly likely to develop heart disease. →
Preventive health programs should target this group.
 Elderly patients (50-60) with no exercise habits have a higher risk of diabetes. →
Health awareness campaigns should focus on promoting exercise.

2. Banking - Credit Card Fraud Detection


Dataset Example

Transaction ID Age Group Region Online Purchase Large Amount Fraudulent?


1 20-30 USA Yes Yes Yes
2 30-40 Canada No No No
3 20-30 USA Yes Yes Yes
4 40-50 UK No Yes No
5 20-30 USA Yes No No

Frequent Itemsets
 {Age: 20-30, Online Purchase: Yes, Large Amount: Yes} → {Fraudulent}
 {Region: USA, Online Purchase: Yes} → {Fraudulent}

Insights

 Young customers (20-30) making large online purchases are at high risk of fraud. →
Banks should flag such transactions for verification.
 Online transactions from the USA show a higher fraud rate. → Implement additional
security for online transactions in this region.

3. E-commerce - Product Recommendation


Dataset Example

User ID Age Group Gender Browsed Category Purchased Category


1 20-30 Male Electronics Accessories
2 30-40 Female Clothing Shoes
User ID Age Group Gender Browsed Category Purchased Category
3 20-30 Male Electronics Accessories
4 40-50 Female Books Books
5 20-30 Male Electronics Electronics

Frequent Itemsets
 {Age: 20-30, Browsed: Electronics} → {Purchased: Accessories}
 {Age: 30-40, Browsed: Clothing} → {Purchased: Shoes}

Insights

 Young males (20-30) browsing electronics often buy accessories. → Show targeted
ads for accessories when they browse electronics.
 Women aged 30-40 browsing clothing often buy shoes. → Suggest shoe
recommendations during clothing purchases.

4. Supermarket - Seasonal Purchase Trends


Dataset Example

Transaction ID Season Customer Type Product Purchased


1 Winter Tourist Jacket
2 Summer Local Cold Drinks
3 Winter Local Heater
4 Summer Tourist Sunglasses
5 Winter Tourist Jacket

Frequent Itemsets
 {Season: Winter, Customer: Tourist} → {Jacket}
 {Season: Summer, Customer: Local} → {Cold Drinks}

Insights

 Tourists in winter mostly buy jackets. → Offer discount bundles for jackets to tourists.
 Locals in summer buy cold drinks. → Increase cold drink promotions in local stores
during summer.

5. Social Media - Ad Targeting


Dataset Example

User ID Age Group Gender Interests Clicked Ad?


1 20-30 Male Sports Yes
2 30-40 Female Cooking No
3 20-30 Male Fitness Yes
4 40-50 Male Business No
5 20-30 Male Sports Yes

Frequent Itemsets
 {Age: 20-30, Gender: Male, Interest: Sports} → {Clicked Ad}
 {Interest: Fitness} → {Clicked Ad}

Insights

 Young males (20-30) interested in sports are more likely to click on ads. → Target
this group for sports product ads.
 Users interested in fitness click on ads frequently. → Increase fitness-related ad
campaigns.

Conclusion

 Multi-dimensional association rule mining uncovers hidden insights by analyzing


multiple attributes.
 These rules are valuable in marketing, fraud detection, healthcare, and e-commerce.
 Businesses can optimize strategies based on demographic, geographic, and behavioral
patterns.

Detailed Explanation of Multi-Dimensional Association Rule


Mining for the Above Examples
Multi-dimensional association rule mining involves extracting hidden patterns from databases
with multiple attributes. We will go through a detailed step-by-step explanation using one of the
examples: E-commerce - Product Recommendation.

1. Understanding the Dataset


In an E-commerce platform, we analyze user behavior based on attributes like age, gender,
browsing history, and purchased products.

Example Dataset: E-commerce Transactions

User ID Age Group Gender Browsed Category Purchased Category


1 20-30 Male Electronics Accessories
2 30-40 Female Clothing Shoes
3 20-30 Male Electronics Accessories
4 40-50 Female Books Books
5 20-30 Male Electronics Electronics

Goal

 Identify patterns between user demographics and purchases.


 Generate association rules to optimize product recommendations.

2. Transforming the Data into Boolean


Format
To apply Apriori Algorithm, we convert categorical attributes into Boolean format.

Age Age Age Browsed Browse Purchase Purchase


Use Browse Purchas
: : : Mal Fema : d: d: Purchase d:
r d: ed:
20- 30- 40- e le Electron Clothin Accessor d: Books Electroni
ID Books Shoes
30 40 50 ics g ies cs
1 1 0 0 1 0 1 0 0 1 0 0 0
2 0 1 0 0 1 0 1 0 0 1 0 0
3 1 0 0 1 0 1 0 0 1 0 0 0
4 0 0 1 0 1 0 0 1 0 0 1 0
5 1 0 0 1 0 1 0 0 0 0 0 1

Each 1 indicates the presence of a characteristic in the transaction.

3. Finding Frequent Itemsets (Using Apriori


Algorithm)
Step 1: Set Minimum Support
We assume min_support = 2 (an itemset must appear in at least 2 transactions).

Step 2: Finding Frequent 1-itemsets


Count the frequency of individual attributes.

Itemset Frequency
Age: 20-30 3
Male 3
Browsed: Electronics 3
Purchased: Accessories 2

All these itemsets occur ≥ 2 times, so they are frequent.

Step 3: Finding Frequent 2-itemsets


Find combinations of two attributes.

Itemset Frequency
{Age: 20-30, Browsed: Electronics} 3
{Age: 20-30, Male} 3
{Male, Browsed: Electronics} 3
{Browsed: Electronics, Purchased: Accessories} 2

All appear at least 2 times, so they are frequent.

Step 4: Finding Frequent 3-itemsets


Find combinations of three attributes.

Itemset Frequency
{Age: 20-30, Browsed: Electronics, Purchased: Accessories} 2
{Male, Browsed: Electronics, Purchased: Accessories} 2

Both itemsets occur ≥ 2 times, so they are frequent.


4. Generating Association Rules
We use minimum confidence = 60% to filter strong rules.

Step 1: Rules from 2-itemsets


1. {Age: 20-30} → {Browsed: Electronics}
o Support = 3/5 = 60%
o Confidence = 3/3 = 100% ✅ (Accepted)
2. {Browsed: Electronics} → {Purchased: Accessories}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)
3. {Male} → {Browsed: Electronics}
o Support = 3/5 = 60%
o Confidence = 3/3 = 100% ✅ (Accepted)

Step 2: Rules from 3-itemsets


1. {Age: 20-30, Browsed: Electronics} → {Purchased: Accessories}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)
2. {Male, Browsed: Electronics} → {Purchased: Accessories}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)

5. Business Insights & Interpretation


Key Insights for the E-commerce Store

1. Young males (20-30) often browse electronics.


→ Targeted advertisements should focus on electronics for this age group.
2. Customers who browse electronics are likely to buy accessories.
→ Recommend accessories when a user views electronics.
3. Males have a high tendency to browse electronics.
→ Increase male-focused electronics promotions.
4. Users in the 20-30 age group who browse electronics frequently buy accessories.
→ Bundle accessories with electronics for better sales.
6. Conclusion
 Multi-dimensional association rule mining uncovers hidden relationships between
user demographics and purchasing behavior.
 The Apriori algorithm is used to find frequent itemsets and generate rules.
 Businesses can use this analysis for targeted marketing, product recommendations,
and sales optimization.

 Transactional Databases:
o These databases store records of transactions, where each transaction consists of a
set of items.
o For example, a supermarket's transactional database would record each customer's
purchase as a separate transaction, with the items bought listed within that
transaction.
 Boolean Association Rules:
o These rules deal with the presence or absence of items, rather than their
quantities.
o They express relationships in the form of "if A, then B," where A and B are sets
of items.
o "Boolean" signifies that we're concerned with whether an item is present (true) or
absent (false) in a transaction.
 Single-Dimensional:
o This means that the association rules involve only one attribute or dimension. In
the context of market basket analysis, this typically refers to the items purchased.
o Contrast this with multi-dimensional association rules, which might involve
attributes like customer age, income, or location.
 Association Rule Mining:
o The goal is to discover interesting relationships or patterns between items in a
transactional database.
o These relationships are expressed as association rules.

Key Metrics:

 Support:
o The proportion of transactions that contain both itemset A and itemset B.

o Support(A → B) = (Number of transactions containing A ∪ B) / (Total number of


o It measures how frequently the rule applies to the dataset.

transactions)
 Confidence:
o The proportion of transactions that contain itemset A that also contain itemset B.

o Confidence(A → B) = (Number of transactions containing A ∪ B) / (Number of


o It measures how often the rule is true.

transactions containing A)

Example:
Let's consider a simple transactional database of a small grocery store:

 Transactions:
o T1: {Bread, Milk}
o T2: {Bread, Diapers, Beer, Eggs}
o T3: {Milk, Diapers, Beer, Cola}
o T4: {Bread, Milk, Diapers, Beer}
o T5: {Bread, Milk, Diapers, Cola}

Mining Single-Dimensional Boolean Association Rules:

1. Frequent Itemsets:
o We need to find itemsets that appear frequently in the transactions.
o Let's set a minimum support threshold of 3.
o Frequent itemsets:
 {Bread}: 4
 {Milk}: 4
 {Diapers}: 4
 {Beer}: 3
 {Bread, Milk}: 3
 {Bread, Diapers}: 3
 {Milk, Diapers}: 3
 {Milk, Bread, Diapers}: 3
2. Generating Association Rules:
o From the frequent itemsets, we can generate association rules.
o Let's set a minimum confidence threshold of 60%.
o Example rules:
 {Bread} → {Milk}:
 Support = 3/5 = 60%
 Confidence = 3/4 = 75%
 {Diapers} → {Beer}:
 Support = 3/5 = 60%
 Confidence = 3/4 = 75%
 {Milk} -> {Diapers}:
 Support = 3/5 = 60%
 Confidence = 3/4 = 75%
 {Bread, Milk} -> {Diapers}:
 Support = 3/5 = 60%
 Confidence = 3/3 = 100%

Interpretation:

 The rule "{Bread} → {Milk}" with 75% confidence means that 75% of the customers
who bought bread also bought milk.
 The rule "{Diapers} -> {Beer}" with 75% confidence means that 75% of the customers
that bought diapers also bought beer.
 The rule "{Bread, Milk} -> {Diapers}" with 100% confidence means that every customer
that bought both Bread, and Milk, also bought diapers.

Applications:

 Market Basket Analysis:


o Identifying products that are often purchased together.
o Optimizing product placement in stores.
 Recommendation Systems:
o Suggesting items to customers based on their past purchases.
 Cross-Selling:
o Promoting related products to customers.

By mining single-dimensional Boolean association rules, businesses can gain valuable insights
into customer behavior and optimize their strategies.

Understanding Multi-Dimensional Association Rules:

 Beyond Simple Transactions:


o Unlike basic market basket analysis, which focuses solely on "items bought,"
multi-dimensional association rules consider various attributes or dimensions.
o These dimensions can include:
 Customer demographics (age, income, location)
 Product characteristics (brand, category, price)
 Time-related information (day of the week, season)
 Relational Databases and Data Warehouses:
o These systems store data in structured tables, allowing us to analyze relationships
between multiple attributes.
o Data warehouses, in particular, are designed for analytical purposes, providing
summarized and integrated data that's ideal for multi-dimensional analysis.
 Quantitative and Categorical Attributes:
o Multi-dimensional rules can involve both:
 Categorical attributes: Discrete values (e.g., "color = red," "city =
London").
 Quantitative attributes: Numerical values (e.g., "age > 30," "price <
$100").
o One of the challenges is how to handle the quantitative attributes. Often they are
"discretized" or placed into bins, to be able to be used in the association rule
mining.

Key Concepts and Challenges:

 Discretization:
o Converting continuous quantitative attributes into discrete intervals.
o This can be done statically (before mining) or dynamically (during mining).
o Example: "age" can be discretized into "age = 18-25," "age = 26-35," etc.
 Concept Hierarchies:
o Organizing attributes into hierarchical levels of abstraction.
o Example: "location" can have levels like "city," "state," and "country."
o This allows us to discover rules at different levels of detail.
 Mining Techniques:
o Adapting traditional association rule algorithms (like Apriori) to handle multiple
dimensions.
o Developing new techniques that can efficiently handle the increased complexity.

Example:

Imagine a retail data warehouse with the following dimensions:

 Customer: age, income, location


 Product: category, brand, price
 Time: day of week, season

We might want to discover rules like:

 "IF (customer.age = 20-30) AND (product.category = 'electronics') THEN (product.brand


= 'XYZ') (support = x%, confidence = y%)"
 "IF (customer.location = 'California') AND (time.season = 'Winter') THEN
(product.category = 'Winter apparel') (support = x%, confidence = y%)"
 "IF (customer.income = 'high') AND (product.price = 'high') Then (product.category =
'luxury goods') (support = x%, confidence = y%)"

Process:

1. Data Preprocessing:
o Clean and transform the data.
o Discretize quantitative attributes.
o Define concept hierarchies.
2. Frequent Itemset Generation:
o Find combinations of attribute-value pairs that meet a minimum support
threshold.
o This involves considering combinations across multiple dimensions.
3. Rule Generation:
o Generate association rules from the frequent itemsets.
o Calculate confidence and other metrics.
4. Rule Evaluation:
o Filter out uninteresting or redundant rules.
o Use domain knowledge to evaluate the significance of the rules.

Applications:

 Targeted Marketing:
o Identifying customer segments that are likely to respond to specific promotions.
 Sales Forecasting:
o Predicting product demand based on multiple factors.
 Fraud Detection:
o Identifying unusual patterns that may indicate fraudulent activity.
 Business Intelligence:
o Gaining deeper insights into business trends and customer behavior.

By mining multi-dimensional association rules, organizations can extract valuable knowledge


from their data warehouses and make more informed decisions.

explain with example

Let's create a more concrete example to illustrate multi-dimensional association rule mining.

Scenario: Electronics Retailer Data Warehouse

We have a data warehouse for an electronics retailer with the following dimensions:

 Customer:
o Age (quantitative, discretized into ranges)
o Location (categorical: City)
 Product:
o Category (categorical)
o Price (quantitative, discretized into ranges)
 Time:
o Season (categorical)

Sample Data (Simplified):

Customer Age Customer Location Product Category Product Price Season


20-30 London Laptop High Winter
30-40 New York Headphones Medium Summer
20-30 London Mouse Low Winter
40-50 Tokyo Camera High Spring
20-30 London Laptop Medium Winter
30-40 New York Laptop High Summer
20-30 Tokyo Headphones Medium Spring
40-50 London Camera Medium Winter
30-40 New York Headphones Medium Summer
20-30 London Laptop High Winter
Export to Sheets

Mining Process:
1. Data Preprocessing:
o Age is discretized into ranges (20-30, 30-40, 40-50).
o Price is discretized into ranges (Low, Medium, High).
o Other attributes are already categorical.
2. Frequent Itemset Generation:
o Let's set a minimum support threshold of 3 (meaning an itemset must appear in at
least 3 transactions).
o We'll look for combinations of attribute-value pairs.
o Some frequent itemsets:
 {Customer Age = 20-30}: 5
 {Customer Location = London}: 5
 {Product Category = Laptop}: 4
 {Product Price = High}: 3
 {Season = Winter}: 4
 {Customer Age = 20-30, Customer Location = London}: 4
 {Customer Age = 20-30, Product Category = Laptop}: 3
 {Customer Age = 20-30, Product Price = High}: 3
 {Customer Age = 20-30, Season = Winter}: 4
 {Customer Location = London, Product Category = Laptop}: 3
 {Customer Location = London, Season = Winter}: 4
 {Product Category = Laptop, Product Price = High}: 3
 {Product Category = Laptop, Season = Winter}: 3
3. Rule Generation:
o Let's set a minimum confidence threshold of 60%.
o Example rules:
 {Customer Age = 20-30} -> {Customer Location = London} (support =
4/10, confidence = 4/5 = 80%)
 {Customer Age = 20-30} -> {Product Category = Laptop} (support =
3/10, confidence = 3/5 = 60%)
 {Product Category = Laptop} -> {Product Price = High} (support = 3/10,
confidence = 3/4 = 75%)
 {Customer Age = 20-30, Customer Location = London} -> {Product
Category = Laptop} (support = 3/10, confidence = 3/4 = 75%)
 {Customer Age = 20-30, Customer Location = London} -> {Season =
Winter} (support = 4/10, confidence = 4/4 = 100%)
4. Rule Interpretation:
o "80% of customers aged 20-30 are located in London."
o "60% of customers aged 20-30 purchase Laptops."
o "75% of Laptop purchases are high priced."
o "75% of customers aged 20-30 and living in London buy laptops."
o "All of the customers aged 20-30 that live in London, purchase items during the
winter season."

Applications:

 Targeted Promotions:
o The retailer can target 20-30-year-olds in London with laptop promotions during
the winter.
 Product Placement:
o High-priced laptops can be prominently displayed in stores or online for 20-30-
year-old customers.
 Inventory Management:
o Ensure sufficient stock of laptops in London during the winter season.

This example shows how multi-dimensional association rule mining can uncover valuable
insights that go beyond simple item-to-item relationships.

Constraint-Based Association Mining is a specialized approach in association rule mining where


constraints are used to guide the discovery of interesting patterns. Instead of mining all possible
associations, constraints help filter out uninteresting or redundant rules, making the process more
efficient and relevant.

Key Concepts

1. Association Rule Mining: Finds relationships between items in large datasets.


o Example: If a customer buys bread and butter, they are likely to buy jam too.

2. Constraints: Conditions or restrictions applied during mining to improve relevance and


efficiency.
o Examples:
 Item Constraints: Only consider rules with certain items.
 Length Constraints: Restrict the number of items in a rule.
 Support Constraints: Require rules to meet a minimum frequency threshold.
 Interestingness Constraints: Apply measures like confidence or lift to ensure
valuable rules.

Types of Constraints

1. Knowledge-Based Constraints: Based on domain knowledge (e.g., only mining rules related to
electronics).
2. Data-Based Constraints: Based on data properties (e.g., ignore infrequent items).
3. Interestingness Constraints: Based on statistical measures like support, confidence, and lift.
4. Monotonic Constraints: Adding more items always satisfies the constraint.
5. Anti-Monotonic Constraints: Adding more items always violates the constraint.

Benefits

 Efficiency: Reduces computation time by pruning irrelevant rules.


 Relevance: Ensures that the discovered rules are meaningful.
 Scalability: Works better with large datasets.
Applications

 Market Basket Analysis (Retail)


 Fraud Detection (Banking)
 Medical Diagnosis (Healthcare)
 Web Usage Mining (E-commerce)

Constraint-Based Association Mining – A Detailed Explanation with Examples

Constraint-based association rule mining enhances the traditional association rule mining process
by applying various constraints that refine the results. Let’s go through different types of
constraints with detailed examples.

1. Item Constraints
Item constraints restrict the rules to contain specific items or exclude certain items.

Example:

In a supermarket dataset, assume the transactions contain the following items:


{Milk, Bread, Butter, Jam, Cheese, Eggs, Juice, Cereal}

 Include Constraint: Find association rules that must contain "Milk".


o Valid Rule: {Milk, Bread} → {Butter}
o Invalid Rule: {Bread} → {Butter} (Doesn’t contain "Milk")

 Exclude Constraint: Find rules that must not contain "Juice".


o Valid Rule: {Bread, Milk} → {Butter}
o Invalid Rule: {Juice, Bread} → {Cereal} (Contains "Juice")

✅ Use Case: If a retailer wants to analyze only dairy product purchases, they can apply an item
constraint to filter out non-dairy products.

2. Length Constraints
These constraints control the number of items in association rules.

Example:

 Minimum Length Constraint: Only consider rules with at least 3 items.


o Valid Rule: {Milk, Bread} → {Butter} (Contains 3 items)
o Invalid Rule: {Milk} → {Bread} (Contains only 2 items)

 Maximum Length Constraint: Only consider rules with at most 2 items.


o Valid Rule: {Milk} → {Bread}
o Invalid Rule: {Milk, Bread} → {Butter}

✅ Use Case: A business may want to focus only on simple purchase relationships (2-item rules)
instead of complex multi-item rules.

3. Support Constraints
Support measures how frequently an itemset appears in the dataset. A minimum support
constraint ensures that only frequent itemsets are considered.

Example:

Consider the following transactions in a store:

Transaction ID Items Purchased

1 Milk, Bread, Butter

2 Milk, Eggs, Cheese

3 Milk, Bread, Cereal

4 Bread, Butter, Jam

5 Bread, Milk, Butter

 Minimum Support Constraint: Support ≥ 40%


o {Milk, Bread} appears in 3 out of 5 transactions (60%) ✅ (Valid)
o {Eggs, Cheese} appears in 1 out of 5 transactions (20%) ❌ (Invalid)

✅ Use Case: In retail, support constraints help filter out rarely occurring itemsets that are not
useful for marketing strategies.

4. Interestingness Constraints (Confidence & Lift)


These constraints ensure that the discovered rules are not just frequent but also meaningful.
 Confidence: Measures the likelihood of the right-hand side (consequent) given the left-hand
side (antecedent).
 Lift: Measures how much more likely the consequent is when the antecedent occurs.

Example:

 Rule: {Milk, Bread} → {Butter}


o Confidence: 80% (80% of transactions with {Milk, Bread} also contain {Butter})
o Lift: 2.5 (Butter is 2.5 times more likely to be bought with {Milk, Bread} than
randomly)

 Rule: {Juice} → {Eggs}


o Confidence: 30%
o Lift: 0.9 (Less than 1 means no strong association) ❌ (Discard this rule)

✅ Use Case: In targeted promotions, a store might focus only on rules with high lift values,
ensuring strong purchase relationships.

5. Monotonic Constraints
A monotonic constraint means that if an itemset satisfies the constraint, then any of its supersets
will also satisfy it.

Example:

A clothing store is interested in frequent itemsets that contain "Shirt".

If {Shirt} is frequent, then {Shirt, Jeans} and {Shirt, Shoes, Jeans} will also be
frequent.

✅ Use Case: Helps in efficient mining by expanding frequent itemsets only when a smaller
subset is already frequent.

6. Anti-Monotonic Constraints
An anti-monotonic constraint means that if an itemset violates the constraint, then all its
supersets will also violate it.

Example:

Suppose a store wants to analyze rules where the total price of items is below $50.
 {Milk ($10), Bread ($15), Butter ($20)} → Total = $45 ✅ (Valid)
 {Milk ($10), Bread ($15), Butter ($20), Cheese ($30)} → Total = $75 ❌ (Invalid)

Since {Milk, Bread, Butter, Cheese} exceeds $50, any superset containing these items will
also be invalid.

✅ Use Case: Budget-constrained recommendations for customers who prefer lower-priced


bundles.

Real-World Applications
1. Market Basket Analysis (Retail)

 Discover frequent item purchases.


 Use constraints to focus on high-value items or seasonal products.

2. Fraud Detection (Banking)

 Detect unusual transaction patterns.


 Apply constraints to find only high-risk fraud indicators.

3. Medical Diagnosis (Healthcare)

 Identify disease correlations based on symptoms.


 Apply constraints to focus on high-confidence relationships.

4. Web Usage Mining (E-commerce)

 Find patterns in user behavior on websites.


 Use constraints to filter out rare or irrelevant browsing patterns.

You might also like