0% found this document useful (0 votes)

8 views45 pages

Apriori Algorithm Examples

The Apriori algorithm is used in association rule mining to identify frequent itemsets and derive association rules from customer transaction data. It employs a bottom-up approach, generating candidate itemsets and pruning infrequent ones based on support and confidence thresholds. This technique aids in product placement, targeted marketing, and inventory management by revealing item relationships and purchasing patterns.

Uploaded by

tmukthar518

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views45 pages

Apriori Algorithm Examples

Uploaded by

tmukthar518

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

The Apriori algorithm is a classic algorithm used in association rule mining.

It's designed to
discover frequent itemsets in a transactional database and derive association rules from those
itemsets.

Problem:

Imagine you're a supermarket manager. You have a massive database of customer transactions
(i.e., what items each customer bought). You want to find out:

1. Frequently bought itemsets: Which combinations of items are often purchased

together? (e.g., bread and milk, beer and diapers).
2. Association rules: What are the relationships between these items? (e.g., if a customer
buys bread, they are likely to also buy milk).

This information can be used for:

 Product placement: Placing frequently bought items together to increase sales.

 Targeted marketing: Offering promotions on items often purchased together.
 Inventory management: Ensuring popular combinations are always in stock.

The Challenge:

The number of possible itemsets grows exponentially with the number of items. Manually
checking all combinations is computationally infeasible for large datasets.

Solution: The Apriori Algorithm

The Apriori algorithm solves this problem by using a "bottom-up" approach based on the
"Apriori property":

 Apriori Property: If an itemset is frequent, then all of its subsets must also be frequent.

Steps of the Algorithm:

1. Generate Candidate Itemsets (C1): Create a list of all individual items (1-itemsets).
2. Calculate Support: Scan the database and count the occurrences of each itemset in C1.
Calculate the support for each itemset (support = number of transactions containing the
itemset / total number of transactions).
3. Prune Infrequent Itemsets (L1): Define a minimum support threshold. Remove
itemsets from C1 that have support below this threshold, creating L1 (frequent 1-
itemsets).
4. Generate Candidate Itemsets (Ck): Generate new candidate itemsets (Ck) of size k by
combining frequent itemsets from L(k-1). Only combine itemsets that share the first (k-2)
items.
5. Calculate Support: Scan the database and calculate the support for each itemset in Ck.
6. Prune Infrequent Itemsets (Lk): Remove itemsets from Ck that have support below the
minimum support threshold, creating Lk (frequent k-itemsets).
7. Repeat Steps 4-6: Continue generating and pruning candidate itemsets until no more
frequent itemsets can be found.
8. Generate Association Rules: From the frequent itemsets, generate association rules
based on confidence (confidence = support(A∪B) / support(A)). Define a minimum
confidence threshold to filter out weak rules.

Apriori Algorithm: Problem and Solution with Examples

The Apriori algorithm is a classic algorithm used in association rule mining. It's designed to
discover frequent itemsets in a transactional database and derive association rules from those
itemsets.

Problem:

Imagine you're a supermarket manager. You have a massive database of customer transactions
(i.e., what items each customer bought). You want to find out:

1. Frequently bought itemsets: Which combinations of items are often purchased

together? (e.g., bread and milk, beer and diapers).
2. Association rules: What are the relationships between these items? (e.g., if a customer
buys bread, they are likely to also buy milk).

This information can be used for:

 Product placement: Placing frequently bought items together to increase sales.

 Targeted marketing: Offering promotions on items often purchased together.
 Inventory management: Ensuring popular combinations are always in stock.

The Challenge:

The number of possible itemsets grows exponentially with the number of items. Manually
checking all combinations is computationally infeasible for large datasets.

Solution: The Apriori Algorithm

The Apriori algorithm solves this problem by using a "bottom-up" approach based on the
"Apriori property":

 Apriori Property: If an itemset is frequent, then all of its subsets must also be frequent.

Steps of the Algorithm:

Example:

Transactional Database:

Transaction ID Items Bought

T1 A, B, C
T2 A, C
T3 A, D
T4 B, E, F
T5 B, C, F

Minimum Support Threshold: 2 (meaning an itemset must appear in at least 2 transactions)

Minimum Confidence Threshold: 60%

Steps:

1. C1: { {A}, {B}, {C}, {D}, {E}, {F} }

2. Support(C1):
o {A}: 3
o {B}: 3
o {C}: 3
o {D}: 1
o {E}: 1
o {F}: 2
3. L1: { {A}, {B}, {C}, {F} } (D and E are removed due to low support)
4. C2: { {A,B}, {A,C}, {A,F}, {B,C}, {B,F}, {C,F} }
5. Support(C2):
o {A,B}: 1
o {A,C}: 2
o {A,F}: 0
o {B,C}: 2
o {B,F}: 2
o {C,F}: 1
6. L2: { {A,C}, {B,C}, {B,F} }
7. C3: { {B,C,F} }
8. Support(C3):
o {B,C,F}: 1
9. L3: {} because the support of {B,C,F} is 1, which is less than the minimum support of 2.
10. Frequent Itemsets: { {A}, {B}, {C}, {F}, {A,C}, {B,C}, {B,F} }
11. Association Rules (Example):
o {A} -> {C}: Support(A,C) = 2, Support(A) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {B} -> {C}: Support(B,C) = 2, Support(B) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {B} -> {F}: Support(B,F) = 2, Support(B) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {C} -> {A}: Support(A,C) = 2, Support(C) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {C} -> {B}: Support(B,C) = 2, Support(C) = 3, Confidence = 2/3 = 66.67%
(meets minimum confidence)
o {F} -> {B}: Support(B,F) = 2, Support(F) = 2, Confidence = 2/2 = 100% (meets
minimum confidence)

Example 1: Online Book Store

 Scenario:
o An online bookstore wants to analyze customer purchase patterns to recommend
books and create bundled offers.
o Transaction data: Each transaction represents a customer's order, containing the
books they purchased.
 Transactional Data:
o T1: {Novel, Mystery, Thriller}
o T2: {Novel, Science Fiction}
o T3: {Mystery, Cookbook}
o T4: {Novel, Mystery}
o T5: {Novel, Cookbook, Science Fiction}
 Process:
o 1. Setting Parameters:
 Minimum Support: 3 (a set must appear in at least 3 transactions)
 We will focus on finding the frequent item sets.
o 2. Finding Frequent 1-Itemsets (L1):
 Count occurrences:
 Novel: 4
 Mystery: 3
 Thriller: 1
 Science Fiction: 2
 Cookbook: 2
 L1: {Novel}, {Mystery} (Thriller, Science Fiction, and Cookbook are
below the support threshold)
o 3. Generating Candidate 2-Itemsets (C2):
 C2: {Novel, Mystery}
o 4. Finding Frequent 2-Itemsets (L2):
 Count occurrences:
 {Novel, Mystery}: 3
 L2: {Novel, Mystery}
o 5. Generating Candidate 3-Itemsets (C3):
 Because L2 only contains one item set, C3 will be empty.
o 6. Results:
 Frequent Itemsets: {Novel}, {Mystery}, {Novel, Mystery}
o 7. Association Rules (Example):
 {Novel} -> {Mystery} : This indicates that customers who buy Novels
often also buy Mystery books.
 Application:
o The bookstore can recommend Mystery books to customers who purchase
Novels.
o They could also create a bundled offer for Novels and Mystery books.

Example 2: Grocery Store Analysis

 Scenario:
o A grocery store wants to optimize product placement.
o Transaction data: Records of customer purchases.
 Transactional Data:
o T1: {Milk, Bread, Eggs}
o T2: {Milk, Yogurt}
o T3: {Bread, Diapers, Beer}
o T4: {Milk, Bread, Diapers, Yogurt}
o T5: {Yogurt, Eggs}
 Process:
o 1. Setting Parameters:
 Minimum Support: 2
o 2. Finding Frequent 1-Itemsets (L1):
 Milk: 3, Bread: 2, Eggs: 2, Yogurt: 3, Diapers: 2, Beer: 1.
 L1: {Milk}, {Bread}, {Eggs}, {Yogurt}, {Diapers}
o 3. Generating Candidate 2-Itemsets (C2):
 C2: {Milk, Bread}, {Milk, Eggs}, {Milk, Yogurt}, {Milk, Diapers},
{Bread, Eggs}, {Bread, Yogurt}, {Bread, Diapers}, {Eggs, Yogurt},
{Eggs, Diapers}, {Yogurt, Diapers}
o 4. Finding Frequent 2-Itemsets (L2):
 After counting support, L2: {Milk, Bread}, {Milk, Yogurt}, {Bread,
Diapers}, {Yogurt, Eggs}
o 5. Generating Candidate 3-Itemsets (C3):
 C3: {Milk, Bread, Diapers}
o 6. finding Frequent 3-Itemsets (L3):
 After counting support, L3: {Milk, Bread, Diapers}
o 7. Results:
 Frequent Itemsets: {Milk}, {Bread}, {Eggs}, {Yogurt}, {Diapers}, {Milk,
Bread}, {Milk, Yogurt}, {Bread, Diapers}, {Yogurt, Eggs}, {Milk, Bread,
Diapers}
o 8. Association rules (Example):
 {Milk, Bread} -> {Diapers}
 Application:
o The store can place Milk, Bread, and Diapers close together.
o They can also place Yogurt and Eggs next to each other.
o This information can be used for sales promotions.

Key takeaways from these examples:

 The Apriori algorithm efficiently narrows down the search for frequent itemsets.
 The support threshold is crucial in determining which itemsets are considered frequent.
 Association rules provide valuable insights into relationships between items.

Example: E-commerce Website Purchase Analysis

 Scenario:
o An e-commerce website selling various electronics wants to analyze customer
purchase patterns to optimize product recommendations and promotional offers.
o The dataset includes transactions with items like laptops, headphones, keyboards,
mice, and external hard drives.
 Transactional Data:
o T1: {Laptop, Headphones, Keyboard}
o T2: {Laptop, Mouse}
o T3: {Headphones, External Hard Drive}
o T4: {Laptop, Headphones}
o T5: {Laptop, Keyboard, Mouse}
o T6: {Headphones, Keyboard}
o T7: {Laptop, Headphones, External Hard Drive}
o T8: {Keyboard, Mouse}
o T9: {Laptop, Mouse}
o T10: {Headphones, External Hard Drive, Keyboard}
 Parameters:
o Minimum Support: 3
o Minimum Confidence: 60%
 Process:
o 1. Finding Frequent 1-Itemsets (L1):
 Count occurrences:
 Laptop: 5
 Headphones: 5
 Keyboard: 4
 Mouse: 3
 External Hard Drive: 3
 L1: {Laptop}, {Headphones}, {Keyboard}, {Mouse}, {External Hard
Drive}
o 2. Generating Candidate 2-Itemsets (C2):
 C2: {Laptop, Headphones}, {Laptop, Keyboard}, {Laptop, Mouse},
{Laptop, External Hard Drive}, {Headphones, Keyboard}, {Headphones,
Mouse}, {Headphones, External Hard Drive}, {Keyboard, Mouse},
{Keyboard, External Hard Drive}, {Mouse, External Hard Drive}
o 3. Finding Frequent 2-Itemsets (L2):
 After counting support:
 {Laptop, Headphones}: 3
 {Laptop, Mouse}: 3
 {Headphones, Keyboard}: 3
 {Headphones, External Hard Drive}: 3
 {Keyboard, Mouse}: 3
 L2: {Laptop, Headphones}, {Laptop, Mouse}, {Headphones, Keyboard},
{Headphones, External Hard Drive}, {Keyboard, Mouse}
o 4. Generating Candidate 3-Itemsets (C3):
 C3: {Laptop, Headphones, Keyboard}, {Laptop, Headphones, Mouse},
{Headphones, Keyboard, External Hard Drive}
o 5. Finding Frequent 3-Itemsets (L3):
 After counting support:
 {Headphones, Keyboard, External Hard Drive}: 2
 {Laptop, Headphones, Keyboard}: 1
 {Laptop, Headphones, Mouse}: 1
 L3: {} Because none of the 3 item sets have a support of 3 or more.
o 6. Generating Association Rules:
 From L2:
 {Laptop} -> {Headphones}: Support = 3/10, Confidence = 3/5 =
60%
 {Laptop} -> {Mouse}: Support = 3/10, Confidence = 3/5 = 60%
 {Headphones} -> {Keyboard}: Support = 3/10, Confidence = 3/5
= 60%
 {Headphones} -> {External Hard Drive}: Support = 3/10,
Confidence = 3/5 = 60%
 {Keyboard} -> {Mouse}: Support = 3/10, Confidence = 3/4 = 75%
 {Headphones} -> {Laptop}: Support = 3/10, Confidence = 3/5 =
60%
 {Mouse} -> {Laptop}: Support = 3/10, Confidence = 3/3 = 100%
{Keyboard} -> {Headphones}: Support = 3/10, Confidence = 3/4

= 75%
 {External Hard Drive} -> {Headphones}: Support = 3/10,
Confidence = 3/3 = 100%
 {Mouse} -> {Keyboard}: Support = 3/10, Confidence = 3/3 =
100%
 Results and Applications:
o The e-commerce website can use these rules to:
 Recommend headphones to customers who purchase laptops.
 Suggest mice to customers who buy keyboards.
 Create bundled offers for laptops and mice.
 Promote external hard drives to people who buy headphones.
 Place keyboard and mouse together on the webpage.

Problem Statement

In large transactional databases, identifying frequent itemsets (groups of items that appear
together frequently) is a critical challenge for applications like market basket analysis.
Traditional methods of finding frequent itemsets can be computationally expensive due to the
exponential number of possible item combinations.

Solution: Apriori Algorithm

The Apriori Algorithm efficiently finds frequent itemsets using the Apriori property, which
states that:

"A subset of a frequent itemset must also be frequent."

This property helps in reducing the search space by eliminating infrequent itemsets early.

Example of Apriori Algorithm

Step 1: Given Transaction Dataset

Consider the following transactions in a grocery store:

Transaction ID Items Purchased

1 {Milk, Bread, Butter}

2 {Milk, Bread}

3 {Milk, Butter}
Transaction ID Items Purchased

4 {Bread, Butter}

5 {Milk, Bread, Butter}

Step 2: Set Minimum Support

Let's set the minimum support count = 2 (an itemset must appear in at least 2 transactions to be
considered frequent).

Step 3: Find Frequent Itemsets

1. Find Frequent 1-itemsets (Count occurrences):

{Milk} → 4
{Bread} → 4
{Butter} → 3

All are frequent (≥2).

2. Generate Candidate 2-itemsets & Count Support:

{Milk, Bread} → 3
{Milk, Butter} → 3
{Bread, Butter} → 3

All are frequent.

3. Generate Candidate 3-itemset & Count Support:

{Milk, Bread, Butter} → 2

Frequent.

4. No More Frequent Itemsets (as adding another itemset violates Apriori property).

Step 4: Generate Association Rules

Using a minimum confidence threshold, we generate rules like:

 {Milk, Bread} → {Butter} (Confidence = 2/3)

 {Milk, Butter} → {Bread} (Confidence = 2/3)
 {Bread, Butter} → {Milk} (Confidence = 2/3)

These rules help businesses understand purchasing patterns and suggest relevant items to
customers.
Conclusion

The Apriori algorithm is an efficient way to extract frequent itemsets and generate association
rules, helping businesses make data-driven decisions in areas like recommendation systems,
inventory management, and cross-selling.

Example 1: Market Basket Analysis

Problem

A supermarket wants to analyze customer purchase patterns to offer better discounts and
improve recommendations.

Dataset (Transactions)
Transaction ID Items Purchased

1 {Apple, Banana, Milk}

2 {Apple, Banana, Bread}

3 {Apple, Milk}

4 {Banana, Bread, Milk}

5 {Apple, Banana, Bread, Milk}

Applying Apriori Algorithm

Minimum Support Threshold = 2

Step 1: Frequent 1-itemsets

Item Count

Apple 4

Banana 4

Milk 4

Bread 3
Step 2: Frequent 2-itemsets
Itemset Count

{Apple, Banana} 3

{Apple, Milk} 3

{Banana, Milk} 3

{Banana, Bread} 3

{Bread, Milk} 2

Step 3: Frequent 3-itemsets

Itemset Count

{Apple, Banana, Milk} 2

{Banana, Bread, Milk} 2

Step 4: Association Rules

1. {Apple, Banana} → {Milk} (Confidence = 2/3)

2. {Banana, Bread} → {Milk} (Confidence = 2/3)
3. {Banana, Milk} → {Apple} (Confidence = 2/3)

Business Insight

 Customers who buy Apple and Banana are likely to buy Milk.
 A discount on Banana & Bread could increase Milk sales.

Example 2: Online Retail Analysis

Problem

An e-commerce website wants to suggest related products based on previous customer

purchases.

Dataset (Transactions)
Transaction ID Products Purchased

1 {Laptop, Mouse, Keyboard}

Transaction ID Products Purchased

2 {Laptop, Mouse}

3 {Laptop, Keyboard}

4 {Mouse, Keyboard, Headphones}

5 {Laptop, Mouse, Keyboard, Headphones}

Applying Apriori Algorithm

Minimum Support Threshold = 2

Step 1: Frequent 1-itemsets

Item Count

Laptop 4

Mouse 4

Keyboard 4

Headphones 2

Step 2: Frequent 2-itemsets

Itemset Count

{Laptop, Mouse} 3

{Laptop, Keyboard} 3

{Mouse, Keyboard} 3

{Mouse, Headphones} 2

Step 3: Frequent 3-itemsets

Itemset Count

{Laptop, Mouse, Keyboard} 2

Step 4: Association Rules

1. {Laptop, Mouse} → {Keyboard} (Confidence = 2/3)

2. {Mouse, Keyboard} → {Laptop} (Confidence = 2/3)
3. {Laptop} → {Mouse} (Confidence = 3/4)

Business Insight

 Customers buying a Laptop and Mouse often buy a Keyboard.

 Customers buying Keyboards & Mice frequently buy Laptops.
 The store can bundle Laptop, Mouse & Keyboard as a package deal.

Example 3: Medical Diagnosis

Problem

A hospital wants to analyze common symptoms that appear together to improve disease
diagnosis.

Dataset (Patients and Symptoms)

Patient ID Symptoms

1 {Fever, Cough, Fatigue}

2 {Fever, Cough}

3 {Cough, Fatigue}

4 {Fever, Fatigue}

5 {Fever, Cough, Fatigue}

Applying Apriori Algorithm

Minimum Support Threshold = 2

Step 1: Frequent 1-itemsets

Symptom Count

Fever 4

Cough 4

Fatigue 3
Step 2: Frequent 2-itemsets
Symptom Pair Count

{Fever, Cough} 3

{Fever, Fatigue} 3

{Cough, Fatigue} 3

Step 3: Frequent 3-itemsets

Symptom Group Count

{Fever, Cough, Fatigue} 2

Step 4: Association Rules

1. {Fever, Cough} → {Fatigue} (Confidence = 2/3)

2. {Fever, Fatigue} → {Cough} (Confidence = 2/3)
3. {Cough, Fatigue} → {Fever} (Confidence = 2/3)

Medical Insight

 Patients with Fever and Cough are likely to develop Fatigue.

 Doctors should monitor Cough & Fatigue for potential Fever onset.

Conclusion

The Apriori Algorithm helps uncover patterns in various domains:

1. Retail – Understanding product co-occurrence for discounts & recommendations.

2. E-commerce – Suggesting frequently bought products together.
3. Healthcare – Identifying symptom relationships for better diagnosis.

Example 1: Market Basket Analysis

Problem Statement

A supermarket wants to analyze customer purchase patterns to offer better discounts and
improve recommendations.

Step 1: Dataset Representation

Transactions:

Transaction ID Items Purchased

1 {Apple, Banana, Milk}

2 {Apple, Banana, Bread}

3 {Apple, Milk}

4 {Banana, Bread, Milk}

5 {Apple, Banana, Bread, Milk}

We assume a minimum support count of 2.

Step 2: Generate Frequent 1-itemsets

Count each item’s occurrence:

Item Count

Apple 4

Banana 4

Milk 4

Bread 3

All items are frequent (≥2).

Step 3: Generate Frequent 2-itemsets

Find combinations of two items and count occurrences.

Itemset Count

{Apple, Banana} 3
Itemset Count

{Apple, Milk} 3

{Banana, Milk} 3

{Banana, Bread} 3

{Bread, Milk} 2

All 2-itemsets are frequent (≥2).

Step 4: Generate Frequent 3-itemsets

Find combinations of three items from the 2-itemsets.

Itemset Count

{Apple, Banana, Milk} 2

{Banana, Bread, Milk} 2

Both are frequent.

Step 5: Generate Association Rules

Now, we generate rules using minimum confidence = 50%.

1. {Apple, Banana} → {Milk}

o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

2. {Banana, Bread} → {Milk}

o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

3. {Banana, Milk} → {Apple}

o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅
Business Insights
 Customers buying Apple and Banana are likely to buy Milk.
 A discount on Banana & Bread could increase Milk sales.
 Suggest Apple to customers buying Banana & Milk.

Example 2: Online Retail Analysis

Problem Statement

An e-commerce website wants to suggest related products based on previous customer

purchases.

Step 1: Dataset Representation

Transaction ID Products Purchased

1 {Laptop, Mouse, Keyboard}

2 {Laptop, Mouse}

3 {Laptop, Keyboard}

4 {Mouse, Keyboard, Headphones}

5 {Laptop, Mouse, Keyboard, Headphones}

Minimum support count = 2.

Step 2: Generate Frequent 1-itemsets

Item Count

Laptop 4

Mouse 4
Item Count

Keyboard 4

Headphones 2

All are frequent.

Step 3: Generate Frequent 2-itemsets

Itemset Count

{Laptop, Mouse} 3

{Laptop, Keyboard} 3

{Mouse, Keyboard} 3

{Mouse, Headphones} 2

All are frequent.

Step 4: Generate Frequent 3-itemsets

Itemset Count

{Laptop, Mouse, Keyboard} 2

This is frequent.

Step 5: Generate Association Rules

1. {Laptop, Mouse} → {Keyboard}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅
2. {Mouse, Keyboard} → {Laptop}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

3. {Laptop,keyboard} → {Mouse}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅

Business Insights
 Customers buying a Laptop and Mouse often buy a Keyboard.
 Customers buying Keyboards & Mice frequently buy Laptops.
 The store can bundle Laptop, Mouse & Keyboard as a package deal.

Example 3: Medical Diagnosis

Problem Statement

A hospital wants to analyze common symptoms that appear together to improve disease
diagnosis.

Step 1: Dataset Representation

Patient ID Symptoms

1 {Fever, Cough, Fatigue}

2 {Fever, Cough}

3 {Cough, Fatigue}

4 {Fever, Fatigue}

5 {Fever, Cough, Fatigue}

Minimum support count = 2.

Step 2: Generate Frequent 1-itemsets
Symptom Count

Fever 4

Cough 4

Fatigue 3

All are frequent.

Step 3: Generate Frequent 2-itemsets

Symptom Pair Count

{Fever, Cough} 3

{Fever, Fatigue} 3

{Cough, Fatigue} 3

All are frequent.

Step 4: Generate Frequent 3-itemsets

Symptom Group Count

{Fever, Cough, Fatigue} 2

This is frequent.

Step 5: Generate Association Rules

1. {Fever, Cough} → {Fatigue}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

2. {Fever, Fatigue} → {Cough}

o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

3. {Cough, Fatigue} → {Fever}

o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅

Medical Insights
 Patients with Fever and Cough are likely to develop Fatigue.
 Doctors should monitor Cough & Fatigue for potential Fever onset.

Mining Single-Dimensional Boolean Association Rules from

Transactional Databases
1. Introduction

Mining association rules is a key technique in data mining, used to discover relationships
between items in large transactional databases.
A single-dimensional Boolean association rule means:

 Single-dimensional: The rule involves only one attribute (e.g., "Items Purchased").
 Boolean: The presence (1) or absence (0) of an item is considered (no quantity or other
attributes).

For example, in a supermarket:

{Milk, Bread} → {Butter}

(If a customer buys Milk and Bread, they are likely to buy Butter)

2. Problem Definition
Given a transactional database, we aim to find frequent itemsets and derive association rules
based on the Apriori algorithm.

Transactional Database Example

Transaction ID Items Purchased
1 {Milk, Bread, Butter}
2 {Milk, Bread}
3 {Milk, Butter}
4 {Bread, Butter}
5 {Milk, Bread, Butter}

Goal

1. Find Frequent Itemsets

o Identify sets of items that frequently occur together.
2. Generate Association Rules
o Find relationships between itemsets using support and confidence.

3. Steps in Mining Boolean Association Rules

We follow two major steps:

Step 1: Finding Frequent Itemsets (Using Apriori Algorithm)

A frequent itemset is an itemset that appears in the transactions at least min_support times.

Step 2: Generating Association Rules

Association rules are derived from frequent itemsets and must meet min_confidence.

4. Step 1: Finding Frequent Itemsets

(a) Set Minimum Support

Let’s assume min_support = 2 (an itemset must appear in at least 2 transactions).

(b) Finding Frequent 1-itemsets

Count the frequency of individual items:

Item Frequency
Milk 4
Bread 4
Item Frequency
Butter 3

All items occur at least 2 times, so they are frequent.

(c) Finding Frequent 2-itemsets

Find all possible pairs and count their occurrence:

Itemset Frequency
{Milk, Bread} 3
{Milk, Butter} 3
{Bread, Butter} 3

All itemsets appear ≥ 2 times, so they are frequent.

(d) Finding Frequent 3-itemsets

Find combinations of three items:

Itemset Frequency
{Milk, Bread, Butter} 2

This itemset is frequent (≥2).

5. Step 2: Generating Association Rules

We use minimum confidence (min_confidence = 60%) to filter the best rules.

Generating Rules from Frequent Itemsets

(a) From 2-itemsets

1. {Milk} → {Bread}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
2. {Bread} → {Milk}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)
3. {Milk} → {Butter}
o Support = 3/5 = 60%
o Confidence = 3/4 = 75% ✅ (Accepted)

(b) From 3-itemsets

1. {Milk, Bread} → {Butter}

o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)
2. {Milk, Butter} → {Bread}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)

6. Interpretation and Business Insights

Insights for Supermarket

 Customers buying Milk and Bread are likely to buy Butter. → Recommend Butter to
such customers.
 Discount on Bread might increase Milk sales.
 Bundle Milk, Bread & Butter together as a combo deal.

7. Conclusion
 Single-dimensional Boolean association rules help find item relationships in
transactions.
 The Apriori Algorithm efficiently finds frequent itemsets.
 Support & Confidence metrics help derive strong association rules.

Mining Multi-Dimensional Association Rules

from Relational Databases & Data
Warehouses
1. Introduction
Multi-dimensional association rule mining extends traditional (single-dimensional) association
rule mining by incorporating multiple attributes from relational databases or data warehouses.

 Single-dimensional: Uses only one attribute (e.g., "Items Purchased") → {Milk,

Bread} → {Butter}
 Multi-dimensional: Uses multiple attributes (e.g., "Age", "Region", "Category") →
{Age: 20-30, Gender: Male, Item: Laptop} → {Item: Mouse}

Applications of Multi-Dimensional Association Rules

 Retail: Analyze how customer demographics influence purchases.

 Healthcare: Find disease patterns based on age, gender, and lifestyle.
 Finance: Identify fraud patterns based on location, transaction type, and amount.

2. Problem Definition
Example: Retail Transaction Database

A supermarket wants to analyze the influence of customer attributes on purchases.

We use a relational database with the following schema:

Transaction ID Customer Age Gender Location Purchased Items

1 25-34 Male New York {Milk, Bread}
2 35-44 Female Chicago {Milk, Butter}
3 25-34 Male New York {Milk, Bread, Butter}
4 45-54 Male Los Angeles {Bread, Butter}
5 25-34 Female Chicago {Milk, Bread}

Goal

1. Find frequent itemsets considering customer attributes.

2. Generate association rules based on demographic influence.

3. Types of Multi-Dimensional Association Rules

1. Inter-dimensional Association Rules
o Involve multiple attributes → {Age: 25-34, Location: New York} → {Milk,
Bread}
2. Hybrid-Dimensional Association Rules
o Involve both categorical attributes & transactional items → {Gender: Male,
Milk} → {Bread}
3. Quantitative Association Rules
o Use numeric ranges → {Age: 25-34} → {Milk, Bread}

4. Steps in Mining Multi-Dimensional Association Rules

Step 1: Transform Relational Data into Transaction Format

We represent data in a structured Boolean matrix:

Transaction Age: 25- Age: 35- Age: 45-

Male Female NY Chicago LA Milk Bread Butter
ID 34 44 54
1 1 0 0 1 0 1 0 0 1 1 0
2 0 1 0 0 1 0 1 0 1 0 1
3 1 0 0 1 0 1 0 0 1 1 1
4 0 0 1 1 0 0 0 1 0 1 1
5 1 0 0 0 1 0 1 0 1 1 0

Each 1 represents the presence of an attribute/item in a transaction.

Step 2: Apply Apriori Algorithm to Find Frequent Itemsets

(a) Set Minimum Support

Assume min_support = 2.

(b) Frequent 1-itemsets

Itemset Frequency
Age: 25-34 3
Male 3
Female 2
NY 2
Chicago 2
Milk 4
Bread 4
Butter 3

All are frequent (≥2).

Itemset Frequency
{Age: 25-34, Milk} 3
{Male, Bread} 3
{Milk, Bread} 3
{NY, Milk} 2
{Chicago, Milk} 2

All are frequent.

(d) Frequent 3-itemsets

Itemset Frequency
{Age: 25-34, Milk, Bread} 2
{Male, Milk, Bread} 2

All are frequent.

Step 3: Generate Association Rules

We use min_confidence = 60%.

Rules from 2-itemsets

1. {Age: 25-34} → {Milk}

o Confidence = 3/3 = 100% ✅
2. {Male} → {Bread}
o Confidence = 3/3 = 100% ✅
3. {Milk} → {Bread}
o Confidence = 3/4 = 75% ✅

Rules from 3-itemsets

1. {Age: 25-34, Milk} → {Bread}

o Confidence = 2/3 = 66.7% ✅
2. {Male, Milk} → {Bread}
o Confidence = 2/3 = 66.7% ✅
5. Interpretation & Business Insights
Insights for Supermarket

1. Younger customers (25-34) are highly likely to buy Milk.

→ Advertise Milk deals to this age group.
2. Males are more likely to buy Bread.
→ Offer combo discounts on Bread for male shoppers.
3. If a customer buys Milk, they are likely to buy Bread.
→ Bundle Milk & Bread together.

6. Conclusion
 Multi-dimensional association rule mining helps uncover hidden relationships between
multiple attributes.
 Relational databases are transformed into Boolean transaction format for mining.
 The Apriori Algorithm is applied to find frequent itemsets & rules.
 Business insights allow for better marketing strategies.

More Examples of Multi-Dimensional Association Rule Mining

Here are some additional real-world examples where multi-dimensional association rule mining
is applied:

1. Healthcare - Disease Prediction

Dataset Example

Patient ID Age Group Gender Smoking Exercise Disease

1 40-50 Male Yes No Heart Disease
2 30-40 Female No Yes No Disease
3 40-50 Male Yes No Heart Disease
4 50-60 Female No No Diabetes
5 30-40 Male Yes Yes No Disease

Frequent Itemsets
 {Smoking: Yes, Exercise: No} → {Heart Disease}
 {Age: 40-50, Smoking: Yes} → {Heart Disease}
 {Age: 50-60, No Exercise} → {Diabetes}

Insights

 Patients who smoke and do not exercise are highly likely to develop heart disease. →
Preventive health programs should target this group.
 Elderly patients (50-60) with no exercise habits have a higher risk of diabetes. →
Health awareness campaigns should focus on promoting exercise.

2. Banking - Credit Card Fraud Detection

Dataset Example

Transaction ID Age Group Region Online Purchase Large Amount Fraudulent?

1 20-30 USA Yes Yes Yes
2 30-40 Canada No No No
3 20-30 USA Yes Yes Yes
4 40-50 UK No Yes No
5 20-30 USA Yes No No

Frequent Itemsets
 {Age: 20-30, Online Purchase: Yes, Large Amount: Yes} → {Fraudulent}
 {Region: USA, Online Purchase: Yes} → {Fraudulent}

Insights

 Young customers (20-30) making large online purchases are at high risk of fraud. →
Banks should flag such transactions for verification.
 Online transactions from the USA show a higher fraud rate. → Implement additional
security for online transactions in this region.

3. E-commerce - Product Recommendation

Dataset Example

User ID Age Group Gender Browsed Category Purchased Category

1 20-30 Male Electronics Accessories
2 30-40 Female Clothing Shoes
User ID Age Group Gender Browsed Category Purchased Category
3 20-30 Male Electronics Accessories
4 40-50 Female Books Books
5 20-30 Male Electronics Electronics

Frequent Itemsets
 {Age: 20-30, Browsed: Electronics} → {Purchased: Accessories}
 {Age: 30-40, Browsed: Clothing} → {Purchased: Shoes}

Insights

 Young males (20-30) browsing electronics often buy accessories. → Show targeted
ads for accessories when they browse electronics.
 Women aged 30-40 browsing clothing often buy shoes. → Suggest shoe
recommendations during clothing purchases.

4. Supermarket - Seasonal Purchase Trends

Dataset Example

Transaction ID Season Customer Type Product Purchased

1 Winter Tourist Jacket
2 Summer Local Cold Drinks
3 Winter Local Heater
4 Summer Tourist Sunglasses
5 Winter Tourist Jacket

Frequent Itemsets
 {Season: Winter, Customer: Tourist} → {Jacket}
 {Season: Summer, Customer: Local} → {Cold Drinks}

Insights

 Tourists in winter mostly buy jackets. → Offer discount bundles for jackets to tourists.
 Locals in summer buy cold drinks. → Increase cold drink promotions in local stores
during summer.

5. Social Media - Ad Targeting

Dataset Example

User ID Age Group Gender Interests Clicked Ad?

1 20-30 Male Sports Yes
2 30-40 Female Cooking No
3 20-30 Male Fitness Yes
4 40-50 Male Business No
5 20-30 Male Sports Yes

Frequent Itemsets
 {Age: 20-30, Gender: Male, Interest: Sports} → {Clicked Ad}
 {Interest: Fitness} → {Clicked Ad}

Insights

 Young males (20-30) interested in sports are more likely to click on ads. → Target
this group for sports product ads.
 Users interested in fitness click on ads frequently. → Increase fitness-related ad
campaigns.

Conclusion

 Multi-dimensional association rule mining uncovers hidden insights by analyzing

multiple attributes.
 These rules are valuable in marketing, fraud detection, healthcare, and e-commerce.
 Businesses can optimize strategies based on demographic, geographic, and behavioral
patterns.

Detailed Explanation of Multi-Dimensional Association Rule

Mining for the Above Examples
Multi-dimensional association rule mining involves extracting hidden patterns from databases
with multiple attributes. We will go through a detailed step-by-step explanation using one of the
examples: E-commerce - Product Recommendation.

1. Understanding the Dataset

In an E-commerce platform, we analyze user behavior based on attributes like age, gender,
browsing history, and purchased products.

Example Dataset: E-commerce Transactions

User ID Age Group Gender Browsed Category Purchased Category

1 20-30 Male Electronics Accessories
2 30-40 Female Clothing Shoes
3 20-30 Male Electronics Accessories
4 40-50 Female Books Books
5 20-30 Male Electronics Electronics

Goal

 Identify patterns between user demographics and purchases.

 Generate association rules to optimize product recommendations.

2. Transforming the Data into Boolean

Format
To apply Apriori Algorithm, we convert categorical attributes into Boolean format.

Age Age Age Browsed Browse Purchase Purchase

Use Browse Purchas
: : : Mal Fema : d: d: Purchase d:
r d: ed:
20- 30- 40- e le Electron Clothin Accessor d: Books Electroni
ID Books Shoes
30 40 50 ics g ies cs
1 1 0 0 1 0 1 0 0 1 0 0 0
2 0 1 0 0 1 0 1 0 0 1 0 0
3 1 0 0 1 0 1 0 0 1 0 0 0
4 0 0 1 0 1 0 0 1 0 0 1 0
5 1 0 0 1 0 1 0 0 0 0 0 1

Each 1 indicates the presence of a characteristic in the transaction.

3. Finding Frequent Itemsets (Using Apriori

Algorithm)
Step 1: Set Minimum Support
We assume min_support = 2 (an itemset must appear in at least 2 transactions).

Step 2: Finding Frequent 1-itemsets

Count the frequency of individual attributes.

Itemset Frequency
Age: 20-30 3
Male 3
Browsed: Electronics 3
Purchased: Accessories 2

All these itemsets occur ≥ 2 times, so they are frequent.

Step 3: Finding Frequent 2-itemsets

Find combinations of two attributes.

Itemset Frequency
{Age: 20-30, Browsed: Electronics} 3
{Age: 20-30, Male} 3
{Male, Browsed: Electronics} 3
{Browsed: Electronics, Purchased: Accessories} 2

All appear at least 2 times, so they are frequent.

Step 4: Finding Frequent 3-itemsets

Find combinations of three attributes.

Itemset Frequency
{Age: 20-30, Browsed: Electronics, Purchased: Accessories} 2
{Male, Browsed: Electronics, Purchased: Accessories} 2

Both itemsets occur ≥ 2 times, so they are frequent.

4. Generating Association Rules
We use minimum confidence = 60% to filter strong rules.

Step 1: Rules from 2-itemsets

1. {Age: 20-30} → {Browsed: Electronics}
o Support = 3/5 = 60%
o Confidence = 3/3 = 100% ✅ (Accepted)
2. {Browsed: Electronics} → {Purchased: Accessories}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)
3. {Male} → {Browsed: Electronics}
o Support = 3/5 = 60%
o Confidence = 3/3 = 100% ✅ (Accepted)

Step 2: Rules from 3-itemsets

1. {Age: 20-30, Browsed: Electronics} → {Purchased: Accessories}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)
2. {Male, Browsed: Electronics} → {Purchased: Accessories}
o Support = 2/5 = 40%
o Confidence = 2/3 = 66.7% ✅ (Accepted)

5. Business Insights & Interpretation

Key Insights for the E-commerce Store

1. Young males (20-30) often browse electronics.

→ Targeted advertisements should focus on electronics for this age group.
2. Customers who browse electronics are likely to buy accessories.
→ Recommend accessories when a user views electronics.
3. Males have a high tendency to browse electronics.
→ Increase male-focused electronics promotions.
4. Users in the 20-30 age group who browse electronics frequently buy accessories.
→ Bundle accessories with electronics for better sales.
6. Conclusion
 Multi-dimensional association rule mining uncovers hidden relationships between
user demographics and purchasing behavior.
 The Apriori algorithm is used to find frequent itemsets and generate rules.
 Businesses can use this analysis for targeted marketing, product recommendations,
and sales optimization.

 Transactional Databases:
o These databases store records of transactions, where each transaction consists of a
set of items.
o For example, a supermarket's transactional database would record each customer's
purchase as a separate transaction, with the items bought listed within that
transaction.
 Boolean Association Rules:
o These rules deal with the presence or absence of items, rather than their
quantities.
o They express relationships in the form of "if A, then B," where A and B are sets
of items.
o "Boolean" signifies that we're concerned with whether an item is present (true) or
absent (false) in a transaction.
 Single-Dimensional:
o This means that the association rules involve only one attribute or dimension. In
the context of market basket analysis, this typically refers to the items purchased.
o Contrast this with multi-dimensional association rules, which might involve
attributes like customer age, income, or location.
 Association Rule Mining:
o The goal is to discover interesting relationships or patterns between items in a
transactional database.
o These relationships are expressed as association rules.

Key Metrics:

 Support:
o The proportion of transactions that contain both itemset A and itemset B.

o Support(A → B) = (Number of transactions containing A ∪ B) / (Total number of

o It measures how frequently the rule applies to the dataset.

transactions)
 Confidence:
o The proportion of transactions that contain itemset A that also contain itemset B.

o Confidence(A → B) = (Number of transactions containing A ∪ B) / (Number of

o It measures how often the rule is true.

transactions containing A)

Example:
Let's consider a simple transactional database of a small grocery store:

 Transactions:
o T1: {Bread, Milk}
o T2: {Bread, Diapers, Beer, Eggs}
o T3: {Milk, Diapers, Beer, Cola}
o T4: {Bread, Milk, Diapers, Beer}
o T5: {Bread, Milk, Diapers, Cola}

Mining Single-Dimensional Boolean Association Rules:

1. Frequent Itemsets:
o We need to find itemsets that appear frequently in the transactions.
o Let's set a minimum support threshold of 3.
o Frequent itemsets:
 {Bread}: 4
 {Milk}: 4
 {Diapers}: 4
 {Beer}: 3
 {Bread, Milk}: 3
 {Bread, Diapers}: 3
 {Milk, Diapers}: 3
 {Milk, Bread, Diapers}: 3
2. Generating Association Rules:
o From the frequent itemsets, we can generate association rules.
o Let's set a minimum confidence threshold of 60%.
o Example rules:
 {Bread} → {Milk}:
 Support = 3/5 = 60%
 Confidence = 3/4 = 75%
 {Diapers} → {Beer}:
 Support = 3/5 = 60%
 Confidence = 3/4 = 75%
 {Milk} -> {Diapers}:
 Support = 3/5 = 60%
 Confidence = 3/4 = 75%
 {Bread, Milk} -> {Diapers}:
 Support = 3/5 = 60%
 Confidence = 3/3 = 100%

Interpretation:

 The rule "{Bread} → {Milk}" with 75% confidence means that 75% of the customers
who bought bread also bought milk.
 The rule "{Diapers} -> {Beer}" with 75% confidence means that 75% of the customers
that bought diapers also bought beer.
 The rule "{Bread, Milk} -> {Diapers}" with 100% confidence means that every customer
that bought both Bread, and Milk, also bought diapers.

Applications:

 Market Basket Analysis:

o Identifying products that are often purchased together.
o Optimizing product placement in stores.
 Recommendation Systems:
o Suggesting items to customers based on their past purchases.
 Cross-Selling:
o Promoting related products to customers.

By mining single-dimensional Boolean association rules, businesses can gain valuable insights
into customer behavior and optimize their strategies.

Understanding Multi-Dimensional Association Rules:

 Beyond Simple Transactions:

o Unlike basic market basket analysis, which focuses solely on "items bought,"
multi-dimensional association rules consider various attributes or dimensions.
o These dimensions can include:
 Customer demographics (age, income, location)
 Product characteristics (brand, category, price)
 Time-related information (day of the week, season)
 Relational Databases and Data Warehouses:
o These systems store data in structured tables, allowing us to analyze relationships
between multiple attributes.
o Data warehouses, in particular, are designed for analytical purposes, providing
summarized and integrated data that's ideal for multi-dimensional analysis.
 Quantitative and Categorical Attributes:
o Multi-dimensional rules can involve both:
 Categorical attributes: Discrete values (e.g., "color = red," "city =
London").
 Quantitative attributes: Numerical values (e.g., "age > 30," "price <
$100").
o One of the challenges is how to handle the quantitative attributes. Often they are
"discretized" or placed into bins, to be able to be used in the association rule
mining.

Key Concepts and Challenges:

 Discretization:
o Converting continuous quantitative attributes into discrete intervals.
o This can be done statically (before mining) or dynamically (during mining).
o Example: "age" can be discretized into "age = 18-25," "age = 26-35," etc.
 Concept Hierarchies:
o Organizing attributes into hierarchical levels of abstraction.
o Example: "location" can have levels like "city," "state," and "country."
o This allows us to discover rules at different levels of detail.
 Mining Techniques:
o Adapting traditional association rule algorithms (like Apriori) to handle multiple
dimensions.
o Developing new techniques that can efficiently handle the increased complexity.

Example:

Imagine a retail data warehouse with the following dimensions:

 Customer: age, income, location

 Product: category, brand, price
 Time: day of week, season

We might want to discover rules like:

 "IF (customer.age = 20-30) AND (product.category = 'electronics') THEN (product.brand

= 'XYZ') (support = x%, confidence = y%)"
 "IF (customer.location = 'California') AND (time.season = 'Winter') THEN
(product.category = 'Winter apparel') (support = x%, confidence = y%)"
 "IF (customer.income = 'high') AND (product.price = 'high') Then (product.category =
'luxury goods') (support = x%, confidence = y%)"

Process:

1. Data Preprocessing:
o Clean and transform the data.
o Discretize quantitative attributes.
o Define concept hierarchies.
2. Frequent Itemset Generation:
o Find combinations of attribute-value pairs that meet a minimum support
threshold.
o This involves considering combinations across multiple dimensions.
3. Rule Generation:
o Generate association rules from the frequent itemsets.
o Calculate confidence and other metrics.
4. Rule Evaluation:
o Filter out uninteresting or redundant rules.
o Use domain knowledge to evaluate the significance of the rules.

Applications:

 Targeted Marketing:
o Identifying customer segments that are likely to respond to specific promotions.
 Sales Forecasting:
o Predicting product demand based on multiple factors.
 Fraud Detection:
o Identifying unusual patterns that may indicate fraudulent activity.
 Business Intelligence:
o Gaining deeper insights into business trends and customer behavior.

By mining multi-dimensional association rules, organizations can extract valuable knowledge

from their data warehouses and make more informed decisions.

explain with example

Let's create a more concrete example to illustrate multi-dimensional association rule mining.

Scenario: Electronics Retailer Data Warehouse

We have a data warehouse for an electronics retailer with the following dimensions:

 Customer:
o Age (quantitative, discretized into ranges)
o Location (categorical: City)
 Product:
o Category (categorical)
o Price (quantitative, discretized into ranges)
 Time:
o Season (categorical)

Sample Data (Simplified):

Customer Age Customer Location Product Category Product Price Season

20-30 London Laptop High Winter
30-40 New York Headphones Medium Summer
20-30 London Mouse Low Winter
40-50 Tokyo Camera High Spring
20-30 London Laptop Medium Winter
30-40 New York Laptop High Summer
20-30 Tokyo Headphones Medium Spring
40-50 London Camera Medium Winter
30-40 New York Headphones Medium Summer
20-30 London Laptop High Winter
Export to Sheets

Mining Process:
1. Data Preprocessing:
o Age is discretized into ranges (20-30, 30-40, 40-50).
o Price is discretized into ranges (Low, Medium, High).
o Other attributes are already categorical.
2. Frequent Itemset Generation:
o Let's set a minimum support threshold of 3 (meaning an itemset must appear in at
least 3 transactions).
o We'll look for combinations of attribute-value pairs.
o Some frequent itemsets:
 {Customer Age = 20-30}: 5
 {Customer Location = London}: 5
 {Product Category = Laptop}: 4
 {Product Price = High}: 3
 {Season = Winter}: 4
 {Customer Age = 20-30, Customer Location = London}: 4
 {Customer Age = 20-30, Product Category = Laptop}: 3
 {Customer Age = 20-30, Product Price = High}: 3
 {Customer Age = 20-30, Season = Winter}: 4
 {Customer Location = London, Product Category = Laptop}: 3
 {Customer Location = London, Season = Winter}: 4
 {Product Category = Laptop, Product Price = High}: 3
 {Product Category = Laptop, Season = Winter}: 3
3. Rule Generation:
o Let's set a minimum confidence threshold of 60%.
o Example rules:
 {Customer Age = 20-30} -> {Customer Location = London} (support =
4/10, confidence = 4/5 = 80%)
 {Customer Age = 20-30} -> {Product Category = Laptop} (support =
3/10, confidence = 3/5 = 60%)
 {Product Category = Laptop} -> {Product Price = High} (support = 3/10,
confidence = 3/4 = 75%)
 {Customer Age = 20-30, Customer Location = London} -> {Product
Category = Laptop} (support = 3/10, confidence = 3/4 = 75%)
 {Customer Age = 20-30, Customer Location = London} -> {Season =
Winter} (support = 4/10, confidence = 4/4 = 100%)
4. Rule Interpretation:
o "80% of customers aged 20-30 are located in London."
o "60% of customers aged 20-30 purchase Laptops."
o "75% of Laptop purchases are high priced."
o "75% of customers aged 20-30 and living in London buy laptops."
o "All of the customers aged 20-30 that live in London, purchase items during the
winter season."

Applications:

 Targeted Promotions:
o The retailer can target 20-30-year-olds in London with laptop promotions during
the winter.
 Product Placement:
o High-priced laptops can be prominently displayed in stores or online for 20-30-
year-old customers.
 Inventory Management:
o Ensure sufficient stock of laptops in London during the winter season.

This example shows how multi-dimensional association rule mining can uncover valuable
insights that go beyond simple item-to-item relationships.

Constraint-Based Association Mining is a specialized approach in association rule mining where

constraints are used to guide the discovery of interesting patterns. Instead of mining all possible
associations, constraints help filter out uninteresting or redundant rules, making the process more
efficient and relevant.

Key Concepts

1. Association Rule Mining: Finds relationships between items in large datasets.

o Example: If a customer buys bread and butter, they are likely to buy jam too.

2. Constraints: Conditions or restrictions applied during mining to improve relevance and

efficiency.
o Examples:
 Item Constraints: Only consider rules with certain items.
 Length Constraints: Restrict the number of items in a rule.
 Support Constraints: Require rules to meet a minimum frequency threshold.
 Interestingness Constraints: Apply measures like confidence or lift to ensure
valuable rules.

Types of Constraints

1. Knowledge-Based Constraints: Based on domain knowledge (e.g., only mining rules related to
electronics).
2. Data-Based Constraints: Based on data properties (e.g., ignore infrequent items).
3. Interestingness Constraints: Based on statistical measures like support, confidence, and lift.
4. Monotonic Constraints: Adding more items always satisfies the constraint.
5. Anti-Monotonic Constraints: Adding more items always violates the constraint.

Benefits

 Efficiency: Reduces computation time by pruning irrelevant rules.

 Relevance: Ensures that the discovered rules are meaningful.
 Scalability: Works better with large datasets.
Applications

 Market Basket Analysis (Retail)

 Fraud Detection (Banking)
 Medical Diagnosis (Healthcare)
 Web Usage Mining (E-commerce)

Constraint-Based Association Mining – A Detailed Explanation with Examples

Constraint-based association rule mining enhances the traditional association rule mining process
by applying various constraints that refine the results. Let’s go through different types of
constraints with detailed examples.

1. Item Constraints
Item constraints restrict the rules to contain specific items or exclude certain items.

Example:

In a supermarket dataset, assume the transactions contain the following items:

{Milk, Bread, Butter, Jam, Cheese, Eggs, Juice, Cereal}

 Include Constraint: Find association rules that must contain "Milk".

o Valid Rule: {Milk, Bread} → {Butter}
o Invalid Rule: {Bread} → {Butter} (Doesn’t contain "Milk")

 Exclude Constraint: Find rules that must not contain "Juice".

o Valid Rule: {Bread, Milk} → {Butter}
o Invalid Rule: {Juice, Bread} → {Cereal} (Contains "Juice")

✅ Use Case: If a retailer wants to analyze only dairy product purchases, they can apply an item
constraint to filter out non-dairy products.

2. Length Constraints
These constraints control the number of items in association rules.

Example:

 Minimum Length Constraint: Only consider rules with at least 3 items.

o Valid Rule: {Milk, Bread} → {Butter} (Contains 3 items)
o Invalid Rule: {Milk} → {Bread} (Contains only 2 items)

 Maximum Length Constraint: Only consider rules with at most 2 items.

o Valid Rule: {Milk} → {Bread}
o Invalid Rule: {Milk, Bread} → {Butter}

✅ Use Case: A business may want to focus only on simple purchase relationships (2-item rules)
instead of complex multi-item rules.

3. Support Constraints
Support measures how frequently an itemset appears in the dataset. A minimum support
constraint ensures that only frequent itemsets are considered.

Example:

Consider the following transactions in a store:

Transaction ID Items Purchased

1 Milk, Bread, Butter

2 Milk, Eggs, Cheese

3 Milk, Bread, Cereal

4 Bread, Butter, Jam

5 Bread, Milk, Butter

 Minimum Support Constraint: Support ≥ 40%

o {Milk, Bread} appears in 3 out of 5 transactions (60%) ✅ (Valid)
o {Eggs, Cheese} appears in 1 out of 5 transactions (20%) ❌ (Invalid)

✅ Use Case: In retail, support constraints help filter out rarely occurring itemsets that are not
useful for marketing strategies.

4. Interestingness Constraints (Confidence & Lift)

These constraints ensure that the discovered rules are not just frequent but also meaningful.
 Confidence: Measures the likelihood of the right-hand side (consequent) given the left-hand
side (antecedent).
 Lift: Measures how much more likely the consequent is when the antecedent occurs.

Example:

 Rule: {Milk, Bread} → {Butter}

o Confidence: 80% (80% of transactions with {Milk, Bread} also contain {Butter})
o Lift: 2.5 (Butter is 2.5 times more likely to be bought with {Milk, Bread} than
randomly)

 Rule: {Juice} → {Eggs}

o Confidence: 30%
o Lift: 0.9 (Less than 1 means no strong association) ❌ (Discard this rule)

✅ Use Case: In targeted promotions, a store might focus only on rules with high lift values,
ensuring strong purchase relationships.

5. Monotonic Constraints
A monotonic constraint means that if an itemset satisfies the constraint, then any of its supersets
will also satisfy it.

Example:

A clothing store is interested in frequent itemsets that contain "Shirt".

If {Shirt} is frequent, then {Shirt, Jeans} and {Shirt, Shoes, Jeans} will also be
frequent.

✅ Use Case: Helps in efficient mining by expanding frequent itemsets only when a smaller
subset is already frequent.

6. Anti-Monotonic Constraints
An anti-monotonic constraint means that if an itemset violates the constraint, then all its
supersets will also violate it.

Example:

Suppose a store wants to analyze rules where the total price of items is below $50.
 {Milk ($10), Bread ($15), Butter ($20)} → Total = $45 ✅ (Valid)
 {Milk ($10), Bread ($15), Butter ($20), Cheese ($30)} → Total = $75 ❌ (Invalid)

Since {Milk, Bread, Butter, Cheese} exceeds $50, any superset containing these items will
also be invalid.

✅ Use Case: Budget-constrained recommendations for customers who prefer lower-priced

bundles.

Real-World Applications
1. Market Basket Analysis (Retail)

 Discover frequent item purchases.

 Use constraints to focus on high-value items or seasonal products.

2. Fraud Detection (Banking)

 Detect unusual transaction patterns.

 Apply constraints to find only high-risk fraud indicators.

3. Medical Diagnosis (Healthcare)

 Identify disease correlations based on symptoms.

 Apply constraints to focus on high-confidence relationships.

4. Web Usage Mining (E-commerce)

 Find patterns in user behavior on websites.

 Use constraints to filter out rare or irrelevant browsing patterns.

ERJ 170-190 WB 1505-003 Rev. 7
25% (4)
ERJ 170-190 WB 1505-003 Rev. 7
48 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Mod 5
No ratings yet
Mod 5
56 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Mod 3 Notes Full
No ratings yet
Mod 3 Notes Full
25 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Apriori Algo
No ratings yet
Apriori Algo
15 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
13 pages
Unit 4
No ratings yet
Unit 4
72 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin
No ratings yet
Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin
11 pages
Apriori Algorithm Example Problems
No ratings yet
Apriori Algorithm Example Problems
8 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rule
No ratings yet
Association Rule
27 pages
Unit - III
No ratings yet
Unit - III
27 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Unit 4
No ratings yet
Unit 4
97 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Apriori
No ratings yet
Apriori
34 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
Inverted Left- Angled Triangle Wealth Generation Formula
From Everand
Inverted Left- Angled Triangle Wealth Generation Formula
Don Carlos
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
How To Sell Like Crazy On Ebay
From Everand
How To Sell Like Crazy On Ebay
Silas Meadowlark
No ratings yet
TCOM547 Course Syllabus - Fall 2013
No ratings yet
TCOM547 Course Syllabus - Fall 2013
13 pages
The United Methodist Church Ecumenical Christian College Junior High School Department S.Y 2019-2020
No ratings yet
The United Methodist Church Ecumenical Christian College Junior High School Department S.Y 2019-2020
8 pages
I-B.Tec RSDGH
No ratings yet
I-B.Tec RSDGH
58 pages
Control CKT
No ratings yet
Control CKT
36 pages
Hadoop Distributed File System HDFS 1688981751
No ratings yet
Hadoop Distributed File System HDFS 1688981751
49 pages
1 - Introduction To Abstract Algebra
No ratings yet
1 - Introduction To Abstract Algebra
10 pages
S Aj User Manual
No ratings yet
S Aj User Manual
31 pages
Temperature
No ratings yet
Temperature
13 pages
CAF Construction Site Safety Certificate Program: Class 3 - Risk Assessment and Accident Investigation
No ratings yet
CAF Construction Site Safety Certificate Program: Class 3 - Risk Assessment and Accident Investigation
48 pages
Big Five Personality Test
No ratings yet
Big Five Personality Test
3 pages
Board and Pillar
No ratings yet
Board and Pillar
21 pages
Siwes Report On Npa
No ratings yet
Siwes Report On Npa
45 pages
Layout and Stick Diagram
No ratings yet
Layout and Stick Diagram
70 pages
Greek and Vedic Geometry
No ratings yet
Greek and Vedic Geometry
23 pages
ProCash280FL2 - CIMB NIAGA
No ratings yet
ProCash280FL2 - CIMB NIAGA
48 pages
HAZOP For Liquid Extraction
No ratings yet
HAZOP For Liquid Extraction
1 page
Aalco Metals LTD - Stainless Steel 14003 3CR12 Sheet and Plate - 96
No ratings yet
Aalco Metals LTD - Stainless Steel 14003 3CR12 Sheet and Plate - 96
3 pages
2.calculating Marginal Revenue From A Linear Deman...
No ratings yet
2.calculating Marginal Revenue From A Linear Deman...
5 pages
5 S
No ratings yet
5 S
22 pages
New Spiral Periodic Table of The
No ratings yet
New Spiral Periodic Table of The
5 pages
Wall Panels
No ratings yet
Wall Panels
28 pages
05-Number Crunching and Character Patterns
No ratings yet
05-Number Crunching and Character Patterns
2 pages
Programs - ArrayLists and LinkedList
No ratings yet
Programs - ArrayLists and LinkedList
4 pages
Life Cycle Check of A White Board Marker
No ratings yet
Life Cycle Check of A White Board Marker
4 pages
Bailey Eastman: Work Experience
No ratings yet
Bailey Eastman: Work Experience
4 pages
Science Stem Lesson
No ratings yet
Science Stem Lesson
3 pages
Foundations in Creative Leadership Certificate: Program Syllabus
No ratings yet
Foundations in Creative Leadership Certificate: Program Syllabus
26 pages
Las 4 Carpentry 7 8 q3
No ratings yet
Las 4 Carpentry 7 8 q3
5 pages
1721, 1.1 KV Power Cable Schedule (Annexure - 2.0)
No ratings yet
1721, 1.1 KV Power Cable Schedule (Annexure - 2.0)
24 pages