Data Mining Assignment
Samia, 17JZBCS0022 December 15, 2019
Association Analysis
Association rule mining is a technique to identify underlying relations between different items.
Take an example of a Super Market where customers can buy variety of items. Usually, there
is a pattern in what the customers buy. For instance, mothers with babies buy baby products
such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers
and chips etc. In short, transactions involve a pattern. More profit can be generated if the
relationship between the items purchased in different transactions can be identified.
Apriori Algorithm for Association Rule Mining
Different statistical algorithms have been developed to implement association rule mining, and
Apriori is one such algorithm. In this article we will study the theory behind the Apriori algorithm
and will later implement Apriori algorithm in Python.
There are three major components of Apriori algorithm:
• Support
• Confidence
• Lift
Listing 1: Sample code for assosiation analysis. Import libraries read file and display first rows
of data.
1 import numpy as np
2 import matplotlib.pyplot as plt
3 import pandas as pd
4 from apyori import apriori
5
6 store_data = pd.read_csv(’C:/Users/me/Downloads/store_data.csv’, header=←-
None)
7
8 store_data.head()
Page 1
Dataset:
The given dataset, that describe associations between different products given 7500 transac-
tions over the course of a week at a French retail store.
In this output you will see that the first line is now treated as a record instead of header as
shown below:
We can find row and columns by:
1 store_data.shape
The above script should return (7501, 20), which means it has 7501 rows and 20 columns.
Now we will use the Apriori algorithm to find out which items are commonly sold together, so
that store owner can take action to place the related items together or advertise them together
in order to have increased profit.
Data Proprocessing
The Apriori library we are going to use requires our dataset to be in the form of a list of lists,
where the whole dataset is a big list and each transaction in the dataset is an inner list within
the outer big list. Currently we have data in the form of a pandas dataframe. To convert our
pandas dataframe into a list of lists, execute the following script:
1 records = []
2 for i in range(0, 7501):
3 records.append([str(store_data.values[i,j]) for j in range(0, 20)])
Applying Apriori
The next step is to apply the Apriori algorithm on the dataset. To do so, we can use the apriori
class that we imported from the apyori library.
Execute the following script:
Page 2
1 association_rules = apriori(records, min_support=0.0045, min_confidence←-
=0.0002, min_lift=1, min_length=2)
2 association_results = list(association_rules)
now print the result:
1 print(len(association_rules))
The script should return 1684.
lets see what happens if we change the values of the given parameters
1 association_rules = apriori(records, min_support=0.0045, min_confidence←-
=0.0002, min_lift=2, min_length=2)
2 association_results = list(association_rules)
now print the result:
1 print(len(association_rules))
The script should return 350. Now see what happens if we again change the values of the given
parameters
1 association_rules = apriori(records, min_support=0.0045, min_confidence←-
=0.2, min_lift=3, min_length=2)
2 association_results = list(association_rules)
In the second line here we convert the rules found by the apriori class into a list since it is easier
to view the results in this form.
Viewing the Results
Let’s first find the total number of rules mined by the apriori class. Execute the following script:
1 print(len(association_rules))
The script above should return 48. we will stick up with lift=3 as it gives appropraite number of
rules.
Each item corresponds to one rule. Let’s print the first item in the association.rules list to
see the first rule. Execute the following script:
1 print(association_rules[0])
Page 3
The output should look like this:
1 RelationRecord(items=frozenset({’light cream’, ’chicken’}), support←-
=0.004532728969470737, ordered_statistics[OrderedStatistic(items_base=←-
frozenset({’light cream’}), items_add=frozenset({’chicken’}), ←-
confidence=0.29059829059829057, lift=4.84395061728395)])
The first item in the list is a list itself containing three items. The first item of the list shows
the grocery items in the rule.
The following script displays the rule, the support, the confidence, and lift for each rule in a
more clear way:
1 for item in association_rules:
2
3 # first index of the inner list
4 # Contains base item and add item
5 pair = item[0]
6 items = [x for x in pair]
7 print("Rule: " + items[0] + " -> " + items[1])
8
9 #second index of the inner list
10 print("Support: " + str(item[1]))
11
12 #third index of the list located at 0th
13 #of the third index of the inner list
14
15 print("Confidence: " + str(item[2][0][2]))
16 print("Lift: " + str(item[2][0][3]))
17 print("=====================================")
If you execute the above script, you will see all the rules returned by the apriori class. The first
four rules returned by the apriori class look like this:
Results:
Rule: light cream -> chicken
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
=====================================
Rule: mushroom cream sauce -> escalope
Page 4
Support: 0.005732568990801126
Confidence: 0.3006993006993007
Lift: 3.790832696715049
=====================================
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
=====================================
Rule: ground beef -> herb & pepper
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
=====================================
Conclusion
Association rule mining algorithms such as Apriori are very useful for finding simple associations
between our data items. They are easy to implement and have high explain-ability. However for
more advanced insights, such those used by Google or Amazon etc., more complex algorithms,
such as recommender systems, are used. However, you can probably see that this method is
a very simple way to get basic associations if that’s all your use-case needs.
Page 5