L6 Recommendation

COMP6237 Data Mining
Making Recommendations
Jonathon Hare
jsh2@ecs.soton.ac.uk
Introduction
Recommender systems 101
Taxonomy of recommender systems
Collaborative Filtering
Collecting user preferences as sparse vectors
Finding similar users
User and item based filtering

Problem statement: making recommendations
Can we mine data to predict what
things people might like to buy
films they would like to watch
websites and books they might like to read
people other people might like to date
on the basis of past behaviour and shared tastes

Recommender systems 101
Amazon makes recommendations based on past purchase history
Google News
Google recommends
News makesne
articles based o
recommendations
basedclick andand
click search
history
search history
Millions of users
millions of articles &
millions of articl
millions of users
Das et a
Netflix predicts movies you based on past numeric ratings
okcupid predicts people you might based on past site usage/
behaviour and answers to questions
facebook predicts people you might know on your position
within a social network made up of friends with dierent sets of
interests, geographical locations, educational backgrounds,
Recommender systems taxonomy
Group of users Group of items

I like oranges I like kiwis
Observe user preferences

I like oranges I like kiwis
Make predictions about new preferences:

does Bob like strawberries?
Collaborative Filtering and Recommender Systems
Insight: personal preferences are correlated
If Jack loves A and B, and Jill loves A, B, and C, then Jack is more
likely to love C
Collaborative Filtering Task
Discover patterns in observed preference behaviour (e.g. purchase

history, item ratings, click counts) across community of users
Predict new preferences based on those patterns
Does not rely on item or user attributes (e.g. demographic info, author,
genre)
Content-based recommendation: complementary approach

Recommender systems taxonomy
This lecture
Collecting user preferences
Need to make measurements of user preferences.
For example, consider film recommendation:
Might ask users to rate films between 0 and 5 stars
Rating could be continuous - can allow fractional

values, or could be integer
Mapping user actions to numerical scores
Many different ways to map user preferences to

numerical scores
Will depend on the variables that can be measured
For example:
Concert Tickets Online Shopping Site Recommender

Brought 1 Brought 2 Liked 1
Didnt Buy 0 Browsed 1 No vote 0
Didnt buy 0 Disliked -1
Example dataset:
You, Me
Lady in Snakes on Just My Superman The Night
and
the Water a Plane Luck Returns Listener
Dupree
Lisa 2.5 3.5 3.0 3.5 3.0 2.5
Gene 3.0 3.5 1.5 5.0 3.0 3.5
Michael 2.5 3.0 3.5 4.0
Claudia 3.5 3.0 4.0 4.5 2.5
Mick 3.0 4.0 2.0 3.0 3.0 2.0
Jack 3.0 4.0 5.0 3.0 3.5
Toby 4.5 4.0 1.0
This dataset is sparse. It has missing values!
You, Me
Lady in Snakes on Just My Superman The Night
and
the Water a Plane Luck Returns Listener
Dupree
Lisa 2.5 3.5 3.0 3.5 3.0 2.5
Gene 3.0 3.5 1.5 5.0 3.0 3.5
Michael 2.5 3.0 3.5 4.0
Claudia 3.5 3.0 4.0 4.5 2.5
Mick 3.0 4.0 2.0 3.0 3.0 2.0
Jack 3.0 4.0 5.0 3.0 3.5
Toby 4.5 4.0 1.0
Aside: Sparse data
This sparsity is important
We can use it to our advantage:
More efficient storage
Faster computation
But we have to use appropriate data structures
Sparse vectors
Typically implemented with hash maps (fast random insertion) or

parallel sorted arrays (slow random insertion but faster reads)
It has disadvantages too

Aside: spaces, distances and similarity
DATA
(input)
Preprocessing Feature Extraction
Intelligent
Algorithm
Data Mining
INFORMATION
(output)
Key terminology
featurevector: a mathematical vector
just a list of (usually Real) numbers
has a fixed number of elements in it
The number of elements is the dimensionality of the vector
represents a point in a featurespace or equally a direction in the

featurespace
the dimensionality of a featurespace is the dimensionality of

every vector within it
vectors of differing dimensionality cant exist in the same feature

space
Distance and
similarity
Feature extractors are often
defined so that they produce
vectors that are close together
for similar inputs
Closeness of two vectors

can be computed in the
feature space by measuring
a distance between the
vectors.
Euclidean distance
(L2 distance)
L2 distance is the most

intuitive distance
The straight-line distance

between two points
Computed via an
extension of Pythagoras
theorem to n
dimensions:
v
u n
uX p
D2 (p, q) = t (pi qi )2 = ||p q|| = (p q) (p q)
i=1
L1 distance (aka
Taxicab/Manhattan)
L1 distance is computed
along paths parallel to the
axes of the space:
n
X
D1 (p, q) = ||p q||1 = |pi qi |
i=1
The Lp distances
Generalisation of the L1
and L2 distances to higher
orders:
n
X 1
Dp (x, y) = ||x y||p = ( |xi yi |) p
i=1
Cosine Similarity
Cosine similarity measures
the cosine of the angle
between two vectors
It is not a distance!
P
n
pi q i
pq
cos() = = s i=1 s
||p||||q|| P
n P n
p2i qi2
i=1 i=1
Useful if you dont care

about the relative length
of the vectors
Measuring user similarity
Can we use this data to measure and rank the similarity of users?
Need to define a similarity measure or similarity score
On the basis that similar users have similar tastes (i.e. like
the same movies)
Must take into account sparsity - not all users have

seen all movies
Typically the score is a bounded numeric value
1 -> the same . 0 -> completely dissimilar

Demo: Plots of Euclidean Space
Euclidean Similarity
Many ways to compute a similarity based on Euclidean

distance
We could choose:
1
simL2 (x, y) = r P
1+ (rx,i ry,i )2
i2Ixy

where rx,i refers to the rating given by user x to item i,
and Ixy is the set of items rated by both user x and y
Pearson Correlation
An alternative measure of similarity is to compute the

correlation of the users based on the ratings they share
Pearsons correlation is the standard measure of

dependence between two r.vs:
P
(rx,i rx )(ry,i ry )
i2Ixy
sim pearson (x, y) = r P P
(rx,i rx )2 (ry,i ry )2
i2Ixy i2Ixy

where rx is the average rating user x gave for all items in
Ixy
Demo: Plots of Correlation
Important Aside: Grade inflation
Users are notoriously bad at giving consistent absolute ratings
two users might agree that they both enjoyed a film, but
one might give it a 4 and the other a 5
e.g. compare Lisa & Jack in the sample data
Pearson correlation corrects for this automatically, but the

Euclidean similarity doesnt
Data normalisation and mean centring can overcome this
data standardisation
User-based Filtering: Rating and Ranking the critics
We now have a set of measures for computing the

similarity between users.
Can use this to produce a ranked list of the best matches

(most similar users) to a target user
Typically want the top-N users
When computing the ranked list, might only want to

consider a subset of users
e.g. those who rated a particular item

Demo: User-User similarity
User-based Filtering: Recommending Items
Now we know how to find similar users, how can we

recommend items?
Basic idea:
predict the rating, ru,i, of an item i by user u as an

aggregation of the ratings of item i by users similar to u:

ru,i = aggru2U (ru,i )

where U is the set of top users most similar to u that
rated item i.
One possible aggregation approach:
Weight the scores of each item each similar user has rated
i.e. multiply the item scores by the respective users similarity
and combine into a score for each item
Problem: cant just add them together - items that have

more ratings would always have an advantage
Need to normalise by the sum of ratings:

P
sim(u, u)ru,i
u2U
ru,i = P
|sim(u, u)|
u2U
Demo: User-based recommendation
(Some) other aggregation options
1 X dont weight by similarity score (but still only

ru,i = ru,i
N compute the average over similar users)
u2U
ru is the average score of

P
sim(u, u)(ru,i ru ) user u over all items they
ru,i = ru +
u2U
P have scored. This
|sim(u, u)| aggregation compensates
u2U
for users that have high or
low averages.
Matching Products:More like this
In just the same way that we computed similarity

between users, we could flip (transpose) the problem
and compute the similarity between items
Can use as a fuzzy basis for recommending alternative

items
In a future lecture well look at a more structured way of

identifying what products people buy together using
affinity analysis or Market Basket Analysis
Demo: Item-Item similarity
Problems with user based filtering
User-based filtering relies on computing the similarity

against every user
With millions of users this might be a problem!
Computationally hard
If there are thousands of products there might be

little overlap between users
making effective similarity computation hard

Item based collaborative filtering
Designed to work-around problems of user based filtering
based on the idea that comparisons between items will not

change as frequently as comparisons between users
Steps:
Precompute the most similar items for each item and store them
To make a recommendation for a user, look at their top-rated

items and aggregate items similar to those from the
precomputed item similarities
The precomputed item similarities will obviously change with new

ratings, but they will do so slowly
Demo: Precomputing Item-Item similarity
Computing recommendations
For an unrated item (by user u), , that has a top-N

similarity to an item i, that the user has rated, estimate
the rating as

sim(i, i)ru,i
ru,i = P
sim(i, i0 )
i0 2I

where I is the subset of all N items similar to i excluding
all items rated by user u
Demo: Item-based recommendation
Tradeoffs between user-based and item-based
filtering
User-based filtering is easier to implement & doesnt have

the overhead of maintaining the item-item comparisons
User-based filtering also deals better with datasets that

frequently change
For small, relatively dense datasets both user-based and

item-based filtering perform equally well
For large, sparse datasets item-based filtering tends to

work better
The cold-start problem
What happens when a new user signs up to use our

recommendation service or a new item is added?
CF wont work if we dont have any ratings for the new

user/item
This is known as the cold-start problem

Potential solutions for new items
Adopt a hybrid recommendation approach:
Use content-based features to find similar items
Bootstrap ratings for the new item for the users that
rated similar items by averaging the ratings those users
gave to similar items
Potential solutions for new users
Somewhat harder problem
Typically need to bootstrap the user profile
Perhaps use web browsing behaviour (i.e. from

tracking cookies) to attempt to predict what kinds of
things the user is interested in
Perhaps ask new users to perform tasks
e.g. answer some questions, etc

AudioScrobbler An ECS Success Story
AudioScrobbler: An
ECS Success Story
Richard Jones developed
Audioscrobbler as his third year project
Recommended new music based on

your listening
CF based approach like we looked at

today
Richard moved to London and launched

Audioscrobbler as a Web startup.
In 2005 Audioscrobbler.com merged

with an internet radio station creating
Last.fm
CBS Interactive acquired Last.fm for

$280 million
Summary
The ability to make recommendations based on data is one of the
big drivers of data mining.
Potential to be worth $$$ in numerous industries .
Collaborative Filtering systems make use of user behaviour or

ratings in order to gather information to make recommendations.
CF systems dont use content-based features (but other forms for

recommender systems do).
User-based neighbourhood approach to CF computes similarity

between users, and uses this to predict unseen item weights on
the basis of aggregations of the ratings of the similar users.

L6 Recommendation

Uploaded by

Copyright:

Available Formats

L6 Recommendation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L6 Recommendation

Uploaded by

Copyright:

Available Formats

COMP6237 Data Mining

Recommender systems 101

Taxonomy of recommender systems

Collecting user preferences as sparse vectors

Finding similar users

User and item based filtering

Can we mine data to predict what

things people might like to buy

films they would like to watch

websites and books they might like to read

people other people might like to date

on the basis of past behaviour and shared tastes

Group of users Group of items

Group of users Group of items

I like oranges I like kiwis

Observe user preferences

Group of users Group of items

I like oranges I like kiwis

Make predictions about new preferences:

Collaborative Filtering Task

Discover patterns in observed preference behaviour (e.g. purchase

Predict new preferences based on those patterns

Content-based recommendation: complementary approach

Need to make measurements of user preferences.

For example, consider film recommendation:

Might ask users to rate films between 0 and 5 stars

Rating could be continuous - can allow fractional

Many different ways to map user preferences to

Will depend on the variables that can be measured

Concert Tickets Online Shopping Site Recommender

We can use it to our advantage:

More efficient storage

But we have to use appropriate data structures

Typically implemented with hash maps (fast random insertion) or

It has disadvantages too

Preprocessing Feature Extraction

just a list of (usually Real) numbers

has a fixed number of elements in it

The number of elements is the dimensionality of the vector

represents a point in a featurespace or equally a direction in the

the dimensionality of a featurespace is the dimensionality of

vectors of differing dimensionality cant exist in the same feature

Closeness of two vectors

L2 distance is the most

The straight-line distance

Useful if you dont care

Need to define a similarity measure or similarity score

Must take into account sparsity - not all users have

Typically the score is a bounded numeric value

1 -> the same . 0 -> completely dissimilar

Many ways to compute a similarity based on Euclidean

An alternative measure of similarity is to compute the

Pearsons correlation is the standard measure of

Users are notoriously bad at giving consistent absolute ratings

e.g. compare Lisa & Jack in the sample data

Pearson correlation corrects for this automatically, but the

Data normalisation and mean centring can overcome this

We now have a set of measures for computing the

Can use this to produce a ranked list of the best matches