
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Problem Statement

You are the sales manager for "BeerMart", an online beer store in the United States.  You want to
build a recommendation system (collaborative) for your store, where customers will be
recommended the beer that they are most likely to buy. You collected the data about the ratings that
the customers have provided in the past. You can download the dataset from the link below.

Description: Each record is composed of a beer's name, the name of the user along with ratings
provided by users. All ratings are on a scale from 1 to 5 with 5 being the best.

As you solve the case study, answer the following questions

1. Data preparation
1. Choose only those beers that have at least N number of reviews
 Figure out an appropriate value of N using EDA; this may not have one correct
answer, but you shouldn't choose beers having extremely low number of
2. Convert this data frame to a “realratingMatrix” before you build your collaborative
filtering models
2. Data Exploration
1. Determine how similar the first ten users are with each other and visualise it
2. Compute and visualise the similarity between the first 10 beers
3. What are the unique values of ratings?
4. Visualise the rating values and notice:
 The average beer ratings
 The average user ratings
 The average number of ratings given to the beers
 The average number of ratings given by the users
3. Recommendation Models
1. Divide your data into training and testing datasets
 Experiment with 'split' and 'cross-validation' evaluation schemes
2. Build IBCF and UBCF models
3. Compare the performance of the two models and suggest the one that should be
 Plot the ROC curves for UBCF and IBCF and compare them
4.  Give the names of the top 5 beers that you would recommend to the users "cokes",
"genog" & "giblet"

Your assignment will be evaluated using the following guidelines.

Step Meets expectations Does not meet expectations

The data is in incorrect format;

The data has been converted to
beers have not been selected on
Data preparation realRatingMatrix; beers having at least N
the basis of minimum number
ratings have been chosen to build the model
of ratings

Visualisations have been created to understand Visualisations have not been

the similarity between first 10 beers and users; created; the average ratings and
Data exploration
the average ratings and number of ratings for number of ratings of beers and
beers and users have been reported/visualised users are not reported or plotted

Has experimented with split and cross-

Not experimented with the two
Recommendation validation methods; UBCF & IBCF models
methods; ROC curve not
models have been built and compared using ROC
curve or any other relevant metric

You might also like