Market Basket Analysis
Market Basket Analysis
Market Basket Analysis
Submitted By -
Group 4
Rohit Singh
Akshay Anand
Sushobhan Dutta
Saurabh Shimpi
Abhishek Giri
Kaustubh Tidke
Himanshu Patil
Ajey Verma
The shopping basket analysis is a method used to increase sales by understanding consumers
’buying patterns. Analyze a large set of data, including consumer purchasing history, product
groups, and products that are likely to be purchased together. The market basket analysis
determines the strength of the relationship between the paired products purchased together and
reveals patterns of coexistence.
Predictive market basket analysis: This type considers items purchased in sequence to
determine cross-sell .
Differential market basket analysis: This type takes into account data from different stores, as
well as purchases from different groups of customers at different times of the day, month or year.
If the rule is true in one dimension (for example, in one store, time period, or customer group),
but not in others, analysts can identify the factors responsible for the exception. These ideas can
create new products that drive sales.
Benefits of market basket analysis
Example
All the E-commerce websites such as Flipkart, Amazon etc. On a product page these websites
provide the product based on a market basket analysis. They recommend the product based on
related products and the product which are bought together.
2
● Problem Statement
The problem statement of our project is to find the products which can be sold together to the
target customers of a U.K based and registered non-store online retail. The data set is the
transactional data which contains all the transactions occurring between 01/12/2010 to
09/12/2011 . The objective of this project is to increase the sale and profit of online retail stores
and to make useful recommendations to the customer and increase the customer satisfaction.
● Association Rules
Association rule mining finds interesting associations and relationships among large sets of data
items. This rule shows how frequently an item set occurs in a transaction. A typical example is
Market Based Analysis.
Association rules are created by thoroughly analyzing data and looking for frequent if/then
patterns. Then, depending on the following two parameters, the important relationships are
observed:
1.Support: Support indicates how frequently the if/then relationship appears in the database.
2.Confidence: Confidence tells about the number of times these relationships have been found to
be true.
So, in a given transaction with multiple items, Association Rule Mining primarily tries to find
the rules that govern how or why such products/items are often bought together.
Association Rule Mining is sometimes referred to as “Market Basket Analysis”, as it was the
first application area of association mining. The aim is to discover associations of items
occurring together more often than you’d expect from randomly sampling all the possibilities.
3
• Association Rule (AR): implication X ⇒ Y where X,Y ⊆ I and X ⋂ Y =𝚽
Some areas where Association Rule Mining has helped quite a lot:
2.Medical Diagnosis
3.Census Data
4.Protein Sequence.
4
Steps for Market Basket Analysis
Following are the steps for the Market Basket Analysis used for a dataset of an Online Retail
store :
Acquiring Dataset : The dataset for this study is being downloaded from Kaggle with the data
related to the transactional data which contains all the transactions occurring between 01/12/2010
to 09/12/2011 for a UK based and registered non-store online retail. Following are the details for
the dataset being used -
Data Pre - Processing : The step for data pre-processing has two major steps which are
eliminating the missing values and cleaning the data set. In this step, we first check if there are
any missing values, in which columns are they missing and how many of them are missing.
After initially looking at the missing values in the data set, we clean the data set with all the
values and the attributes that are not required for our association analysis. In our example, we
clean the Invoice Number and the Description of items in the dataset.
Then we look at the Buzz words, the undesirable words from the dataset and clean them through
the next step which gives us the dataset that is required for our analysis further.
5
Creating the Baskets: Now that the data is cleaned and pre-processed, we look at creating the
baskets of items according to the transactions in the dataset and this is the base for our further
analysis and creation of our rules for Market Basket Analysis.
Creating Rules: For the association rules as mentioned in our section above, we use Support,
Confidence and lift as our metrics for generating association rules for our study. Here in our
study, we generate two rules, one with support of 5% and confidence of 75% and the other with
support of 1% and confidence of 70%. After creating certain rules, we sort them according to the
Lift that is being generated. For the first rule, we sort it by the decreasing value of Lift and the
second rule is sorted by the decreasing value of Confidence.
Data Visualization: We then visualize the data, the set of rules through the scatter plot and
graphs to find the effective recommended association of the products in our history of
transactions and find the optimal solution for our association problems. The data visualization,
results and interpretation can be seen in the further sections in this report.
Following are the lists of packages used in our Association rules based Market Basket Analysis.
These libraries are free to download and have been available with the latest version of R.For the
complete documentation related to these packages in R, please refer to the sources section of the
report.
1. Stringr - The Stringr package is used to work with the strings that are very helpful in the data
cleaning and preparation tasks. The stringr package has a set of functions that are used to make
working with strings as easy as possible. For the complete documentation related to the Stringr
package in R, please refer to the sources section of the report.
2. Plyr - The Plyr package is used for Splitting, Applying and Combining data that is used for our
analysis. It involves a set of rules for where the common problems are to split the big data into
homogeneous functions, apply functions to each piece and combine all the results.
3. Arules - Arules is an association rule based package that provides an infrastructure to represent,
manipulate and analyse the patterns in the transactional data using the frequent items in the
itemset and the association rules.
6
4. ArulesViz - ArulesViz is the package in the dataset that is used to visualize the data in our
analysis through the set of association rules that are used in the dataset. It is generally an
extension of the Arules package and is used to plot the interactive visualizations used for
exploring the rules generated in the analysis.
● Interpretation
● Sources
● Stringer Package
https://www.rdocumentation.org/packages/stringr/versions/1.4.0#:~:text=stringr%20is%20built
%20on%20top,almost%20anything%20you%20can%20imagine
● Plyr Package
https://www.rdocumentation.org/packages/plyr
● Arules Package
https://www.rdocumentation.org/packages/arules
● Arulesviz Package
https://www.rdocumentation.org/packages/arules
7
Data Set used for the project:
Transactional data set which contains all the transactions occurring between
01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail
Visualization:
8
9
10
11
Conclusion:
We can recommend the customers certain items to buy based on the results
We can bundle items together using bundle pricing in order to increase sales
12