L6 Recommendation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

COMP6237 Data Mining

Making Recommendations
Jonathon Hare

jsh2@ecs.soton.ac.uk
Introduction

Recommender systems 101

Taxonomy of recommender systems

Collaborative Filtering

Collecting user preferences as sparse vectors

Finding similar users

User and item based filtering


Problem statement: making recommendations

Can we mine data to predict what

things people might like to buy

films they would like to watch

websites and books they might like to read

people other people might like to date

on the basis of past behaviour and shared tastes


Recommender systems 101
Amazon makes recommendations based on past purchase history
Google News
Google recommends
News makesne
articles based o
recommendations
basedclick andand
click search
history
search history
Millions of users
millions of articles &
millions of articl
millions of users
Das et a
Netflix predicts movies you based on past numeric ratings
okcupid predicts people you might based on past site usage/
behaviour and answers to questions
facebook predicts people you might know on your position
within a social network made up of friends with dierent sets of
interests, geographical locations, educational backgrounds,
Recommender systems taxonomy
Collaborative Filtering

Group of users Group of items


Collaborative Filtering

Group of users Group of items

I like oranges I like kiwis

Observe user preferences


Collaborative Filtering

Group of users Group of items

I like oranges I like kiwis

Make predictions about new preferences:


does Bob like strawberries?
Collaborative Filtering and Recommender Systems
Insight: personal preferences are correlated

If Jack loves A and B, and Jill loves A, B, and C, then Jack is more
likely to love C

Collaborative Filtering Task

Discover patterns in observed preference behaviour (e.g. purchase


history, item ratings, click counts) across community of users

Predict new preferences based on those patterns

Does not rely on item or user attributes (e.g. demographic info, author,
genre)

Content-based recommendation: complementary approach


Recommender systems taxonomy

This lecture
Collecting user preferences

Need to make measurements of user preferences.

For example, consider film recommendation:

Might ask users to rate films between 0 and 5 stars

Rating could be continuous - can allow fractional


values, or could be integer
Mapping user actions to numerical scores

Many different ways to map user preferences to


numerical scores

Will depend on the variables that can be measured

For example:

Concert Tickets Online Shopping Site Recommender


Brought 1 Brought 2 Liked 1
Didnt Buy 0 Browsed 1 No vote 0
Didnt buy 0 Disliked -1
Example dataset:

You, Me
Lady in Snakes on Just My Superman The Night
and
the Water a Plane Luck Returns Listener
Dupree
Lisa 2.5 3.5 3.0 3.5 3.0 2.5
Gene 3.0 3.5 1.5 5.0 3.0 3.5
Michael 2.5 3.0 3.5 4.0
Claudia 3.5 3.0 4.0 4.5 2.5
Mick 3.0 4.0 2.0 3.0 3.0 2.0
Jack 3.0 4.0 5.0 3.0 3.5
Toby 4.5 4.0 1.0
This dataset is sparse. It has missing values!

You, Me
Lady in Snakes on Just My Superman The Night
and
the Water a Plane Luck Returns Listener
Dupree
Lisa 2.5 3.5 3.0 3.5 3.0 2.5
Gene 3.0 3.5 1.5 5.0 3.0 3.5
Michael 2.5 3.0 3.5 4.0
Claudia 3.5 3.0 4.0 4.5 2.5
Mick 3.0 4.0 2.0 3.0 3.0 2.0
Jack 3.0 4.0 5.0 3.0 3.5
Toby 4.5 4.0 1.0
Aside: Sparse data
This sparsity is important

We can use it to our advantage:

More efficient storage

Faster computation

But we have to use appropriate data structures

Sparse vectors

Typically implemented with hash maps (fast random insertion) or


parallel sorted arrays (slow random insertion but faster reads)

It has disadvantages too


Aside: spaces, distances and similarity
DATA
(input)

Preprocessing Feature Extraction

Intelligent
Algorithm

Data Mining

INFORMATION
(output)
Key terminology
featurevector: a mathematical vector

just a list of (usually Real) numbers

has a fixed number of elements in it

The number of elements is the dimensionality of the vector

represents a point in a featurespace or equally a direction in the


featurespace

the dimensionality of a featurespace is the dimensionality of


every vector within it

vectors of differing dimensionality cant exist in the same feature


space
Distance and
similarity
Feature extractors are often
defined so that they produce
vectors that are close together
for similar inputs

Closeness of two vectors


can be computed in the
feature space by measuring
a distance between the
vectors.
Euclidean distance
(L2 distance)

L2 distance is the most


intuitive distance

The straight-line distance


between two points

Computed via an
extension of Pythagoras
theorem to n
dimensions:
v
u n
uX p
D2 (p, q) = t (pi qi )2 = ||p q|| = (p q) (p q)
i=1
L1 distance (aka
Taxicab/Manhattan)
L1 distance is computed
along paths parallel to the
axes of the space:
n
X
D1 (p, q) = ||p q||1 = |pi qi |
i=1
The Lp distances
Generalisation of the L1
and L2 distances to higher
orders:
n
X 1
Dp (x, y) = ||x y||p = ( |xi yi |) p

i=1
Cosine Similarity
Cosine similarity measures
the cosine of the angle
between two vectors

It is not a distance!
P
n
pi q i
pq
cos() = = s i=1 s
||p||||q|| P
n P n
p2i qi2
i=1 i=1

Useful if you dont care


about the relative length
of the vectors
Measuring user similarity

Can we use this data to measure and rank the similarity of users?

Need to define a similarity measure or similarity score

On the basis that similar users have similar tastes (i.e. like
the same movies)

Must take into account sparsity - not all users have


seen all movies

Typically the score is a bounded numeric value

1 -> the same . 0 -> completely dissimilar


Demo: Plots of Euclidean Space
Euclidean Similarity

Many ways to compute a similarity based on Euclidean


distance

We could choose:
1
simL2 (x, y) = r P
1+ (rx,i ry,i )2
i2Ixy

where rx,i refers to the rating given by user x to item i,
and Ixy is the set of items rated by both user x and y
Pearson Correlation

An alternative measure of similarity is to compute the


correlation of the users based on the ratings they share

Pearsons correlation is the standard measure of


dependence between two r.vs:
P
(rx,i rx )(ry,i ry )
i2Ixy
sim pearson (x, y) = r P P
(rx,i rx )2 (ry,i ry )2
i2Ixy i2Ixy

where rx is the average rating user x gave for all items in
Ixy
Demo: Plots of Correlation
Important Aside: Grade inflation

Users are notoriously bad at giving consistent absolute ratings

two users might agree that they both enjoyed a film, but
one might give it a 4 and the other a 5

e.g. compare Lisa & Jack in the sample data

Pearson correlation corrects for this automatically, but the


Euclidean similarity doesnt

Data normalisation and mean centring can overcome this

data standardisation
User-based Filtering: Rating and Ranking the critics

We now have a set of measures for computing the


similarity between users.

Can use this to produce a ranked list of the best matches


(most similar users) to a target user

Typically want the top-N users

When computing the ranked list, might only want to


consider a subset of users

e.g. those who rated a particular item


Demo: User-User similarity
User-based Filtering: Recommending Items

Now we know how to find similar users, how can we


recommend items?

Basic idea:

predict the rating, ru,i, of an item i by user u as an


aggregation of the ratings of item i by users similar to u:

ru,i = aggru2U (ru,i )

where U is the set of top users most similar to u that
rated item i.
One possible aggregation approach:

Weight the scores of each item each similar user has rated

i.e. multiply the item scores by the respective users similarity

and combine into a score for each item

Problem: cant just add them together - items that have


more ratings would always have an advantage

Need to normalise by the sum of ratings:


P
sim(u, u)ru,i
u2U
ru,i = P
|sim(u, u)|
u2U
Demo: User-based recommendation
(Some) other aggregation options

1 X dont weight by similarity score (but still only


ru,i = ru,i
N compute the average over similar users)
u2U

ru is the average score of


P
sim(u, u)(ru,i ru ) user u over all items they
ru,i = ru +
u2U
P have scored. This
|sim(u, u)| aggregation compensates
u2U
for users that have high or
low averages.
Matching Products:More like this

In just the same way that we computed similarity


between users, we could flip (transpose) the problem
and compute the similarity between items

Can use as a fuzzy basis for recommending alternative


items

In a future lecture well look at a more structured way of


identifying what products people buy together using
affinity analysis or Market Basket Analysis
Demo: Item-Item similarity
Problems with user based filtering

User-based filtering relies on computing the similarity


against every user

With millions of users this might be a problem!

Computationally hard

If there are thousands of products there might be


little overlap between users

making effective similarity computation hard


Item based collaborative filtering
Designed to work-around problems of user based filtering

based on the idea that comparisons between items will not


change as frequently as comparisons between users

Steps:

Precompute the most similar items for each item and store them

To make a recommendation for a user, look at their top-rated


items and aggregate items similar to those from the
precomputed item similarities

The precomputed item similarities will obviously change with new


ratings, but they will do so slowly
Demo: Precomputing Item-Item similarity
Computing recommendations

For an unrated item (by user u), , that has a top-N


similarity to an item i, that the user has rated, estimate
the rating as

sim(i, i)ru,i
ru,i = P
sim(i, i0 )
i0 2I

where I is the subset of all N items similar to i excluding
all items rated by user u
Demo: Item-based recommendation
Tradeoffs between user-based and item-based
filtering

User-based filtering is easier to implement & doesnt have


the overhead of maintaining the item-item comparisons

User-based filtering also deals better with datasets that


frequently change

For small, relatively dense datasets both user-based and


item-based filtering perform equally well

For large, sparse datasets item-based filtering tends to


work better
The cold-start problem

What happens when a new user signs up to use our


recommendation service or a new item is added?

CF wont work if we dont have any ratings for the new


user/item

This is known as the cold-start problem


Potential solutions for new items

Adopt a hybrid recommendation approach:

Use content-based features to find similar items

Bootstrap ratings for the new item for the users that
rated similar items by averaging the ratings those users
gave to similar items
Potential solutions for new users

Somewhat harder problem

Typically need to bootstrap the user profile

Perhaps use web browsing behaviour (i.e. from


tracking cookies) to attempt to predict what kinds of
things the user is interested in

Perhaps ask new users to perform tasks

e.g. answer some questions, etc


AudioScrobbler An ECS Success Story
AudioScrobbler: An
ECS Success Story
Richard Jones developed
Audioscrobbler as his third year project

Recommended new music based on


your listening

CF based approach like we looked at


today

Richard moved to London and launched


Audioscrobbler as a Web startup.

In 2005 Audioscrobbler.com merged


with an internet radio station creating
Last.fm

CBS Interactive acquired Last.fm for


$280 million
Summary
The ability to make recommendations based on data is one of the
big drivers of data mining.

Potential to be worth $$$ in numerous industries .

Collaborative Filtering systems make use of user behaviour or


ratings in order to gather information to make recommendations.

CF systems dont use content-based features (but other forms for


recommender systems do).

User-based neighbourhood approach to CF computes similarity


between users, and uses this to predict unseen item weights on
the basis of aggregations of the ratings of the similar users.

You might also like