0% found this document useful (0 votes)
26 views81 pages

第十讲-Recommender Systems

Uploaded by

owemecoffee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views81 pages

第十讲-Recommender Systems

Uploaded by

owemecoffee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Recommender Systems

程 健 研究员
jcheng@nlpr.ia.ac.cn

中国科学院自动化研究所
模式识别国家重点实验室

1
Index
➢ What is recommender system?

➢ Traditional Methods

➢ Deep Learning base Methods

➢ RS Systems

➢ Conclusion

2
Index
➢ What is recommender system?

➢ Traditional Methods

➢ Deep Learning base Methods

➢ RS Systems

➢ Conclusion

3
Everything is recommendation
Everything is personalized

京东商城
Everything is personalized

彼之砒霜 吾之蜜糖 6
Everything is personalized
• 视频网站
YouTube 土豆 Hulu 奇艺视频 等
• 电子商务网站
淘宝,亚马逊等
• 社交网站
Facebook 人人网 Twitter 微博 等
• ……
The Long Tail

《商业周刊》“Best Idea of 2005”


The Long Tail
 Amazon: 35% 的销售来自推荐
 Google News: 推荐增加了38%的点击率
 Netflix:2/3的电影出租来自推荐

“We are leaving the age of information and entering the


Age of Recommendation”

– The Long Tail (Chris Anderson)


RS Definition
⚫ RS seen as a function

⚫ Given:
➢ User model (e.g. ratings, preferences, demographics, situational context)
➢ Items (with or without description of item characteristics)
⚫ Calculate:
Rating Prediction
➢ Relevance score used for ranking
⚫ Target:
Top-N
➢ Rating Prediction & Top-N Recommendation

⚫ But:
➢ Remember that relevance might be context dependent
➢ Characteristics of the list itself might be important (diversity)
10
Performance Evaluation
⚫ Measures for rating prediction

11
Performance Evaluation
⚫ Measures for top-N recommendation
• NDCG(Normalized Discounted Cumulative Gain)
定义不唯一

Ideal DCG

• F1 Score

• Mean Average Precision (MAP)

12
Index
➢ What is recommender system?

➢ Traditional Methods

➢ Deep Learning base Methods

➢ RS Systems

➢ Conclusion

13
A glance of Paradigms for RS
Personal Recommendation
Recommendation list

User profile & Recommender System


contextual information

14
A glance of Paradigms for RS
Content-based: "Show me more of the
same what I've liked" Recommendation list

User profile & Recommender System


contextual information

Item Attributes

15
A glance of Paradigms for RS
Collaborative: "Tell me what's popular Recommendation list
among my peers"
Recommender System

Community Data

16
A glance of Paradigms for RS
Hybrid: combinations of various inputs
Recommendation list
User profile &
contextual information Recommender System

Community Data

Item Attributes 17
Content-Based Recommendation
⚫ Recommendations based on content of items rather than on
other users’ opinions/interactions
⚫ Goal: recommend items similar to those the user liked
⚫ Common for recommending text-based products (web
pages, news messages)
⚫ Items to recommend are “described” by their associated
features (e.g. keywords)
⚫ User Model structured in a “similar” way as the content:
features/keywords more likely to occur in the preferred
documents (lazy approach)
⚫ The user model can be a classifier based on whatever
technique (Neural Networks, Naïve Bayes...)

18
Content-Based Recommendation
⚫ Content representation and item similarities
Express item features as:
➢ TF-IDF
➢ N-Gram
➢ LDA
➢ Word2Vec

⚫ Compute the similarity of an unseen item with the user profile


based on the keyword features

19
Content-Based Recommendation
⚫ Pros:
➢ No need for data on other users: No cold-start or sparsity
➢ Able to recommend to users with unique tastes
➢ Able to recommend new and unpopular items
➢ Can provide explanations by listing content-features

⚫ Cons:
➢ Requires content that can be encoded as meaningful features
(difficult in some domains/catalogs)
➢ Users represented as learnable function of content features
➢ Difficult to implement serendipity
➢ Easy to overfit (e.g. for a user with few data points)

20
Collaborative Filtering
⚫ List of m Users and a list of n Items

⚫ Each user has a list of items with associated opinion


Explicit (e.g. ratings)
opinion
Implicit (e.g. purchase records)

⚫ Active user for whom the CF prediction task is performed

⚫ Metric for measuring similarity between users


⚫ Method for selecting a subset of neighbors
⚫ Method for predicting a rating for items not currently
rated by the active user.
21
Collaborative Filtering

22
Collaborative Filtering
⚫ memory-based CF
➢ User-based CF
➢ Item-based CF

⚫ model-based CF
➢ First develop a model of user
➢ Type of model:
1. Probabilistic (e.g. Bayesian Network)
2. Clustering
3. Rule-based approaches (e.g. Association
Rules)
4. Classification/Regression
5. … 23
User-based CF
The basic steps:
1. Identify set of ratings for the target/active user
2. Identify set of users most similar to the target/active user
according to a similarity function (neighborhood formation)
3. Identify the products these similar users liked
4. Generate a prediction
5. Based on this predicted rating recommend a set of top N
products

24
User-based CF
⚫ A collection of user and a collection of products

⚫ An m × n matrix of ratings , with if user i did not rate


product j

⚫ Prediction for user i and product j is computed as

⚫ Similarity can be computed by Pearson correlation

25
User-based CF Example

26
User-based CF Example

27
User-based CF Example

28
User-based CF Example

29
Item-based CF Example
The basic steps:
1. Look into the items the target user has rated
2. Compute how similar they are to the target item
3. Select k most similar items
4. Compute Prediction by taking weighted average on the
target user’s ratings on the most similar items

30
Item Similarity Computation
⚫ Similarity: find users who have rated items and apply a similarity
function to their ratings
⚫ Cosine-based Similarity (difference in rating scale between users
is not taken into account)

⚫ Adjusted Cosine Similarity (takes care of difference in rating


scale)

31
Item Similarity Computation
⚫ Alternative similarity metric

32
Model-based CF
Motivated by Netflix Prize (launched in Oct. 2006)
⚫ Task:
High quality recommendations
for cinematch (RMSE=0.9525)
⚫ Dataset:
users: 480,000
movies: 17,770
rates ratio <1%

Improve by 10% = $1million!

33
Model-based CF
Motivated by Netflix Prize (launched in Oct. 2006)

⚫ Measure:

34
Model-based CF

35
Model-based CF
2009 Netflix Prize Results
⚫ Top 2 single algorithms:
SVD/MF - Prize RMSE: 0.8914
RBM - Prize RMSE: 0.8990

⚫ Linear blend Prize RMSE: 0.88

⚫ Currently in use as part of Netflix’ rating prediction


component

36
Matrix Factorization
⚫ Basic idea
item factor

5 3
4 1
5 1 𝑃 factor
𝑄𝑇
user
4 4 2
user factor matrix item factor matrix

⚫ factor size << dim of user/item 𝑝𝑢 𝑓1 𝑓2 𝑓3 … 𝑓k

𝑞𝑣 𝑓1′ 𝑓2′ 𝑓3′ … 𝑓k′

⚫ User factor vectors and item factor vector


37
Matrix Factorization
⚫ Basic idea
Can we
item
predict? factor

5 3
4 1
5 1 𝑃 factor
𝑄𝑇
user
4 4 2
user factor matrix item factor matrix

⚫ factor size << dim of user/item 𝑝𝑢 𝑓1 𝑓2 𝑓3 … 𝑓k

𝑞𝑣 𝑓1′ 𝑓2′ 𝑓3′ … 𝑓k′

⚫ User factor vectors and item factor vector


38
Matrix Factorization
⚫ Basic idea
Yes, we
item
can factor

5 3
4 1
5 1 𝑃 factor
𝑄𝑇
user
4 4 2
user factor matrix item factor matrix

⚫ factor size << dim of user/item 𝑝𝑢 𝑓1 𝑓2 𝑓3 … 𝑓k

𝑞𝑣 𝑓1′ 𝑓2′ 𝑓3′ … 𝑓k′

⚫ User factor vectors and item factor vector


39
Non-negative
Matrix Factorization
⚫ Both entries in factorized P and Q should be >= 0
item factor

5 3
4 1
5 1 𝑃 factor
𝑄𝑇
user
4 4 2
user factor matrix item factor matrix

⚫ Explanation: real world data, i.e. images, has 𝑝𝑢 𝑓1 𝑓2 𝑓3 … 𝑓k >=0


often been represented as non-negative values,
while negative ones doesn’t have any 𝑞𝑣 𝑓1′ 𝑓2′ 𝑓3′ … 𝑓k′ >=0
meanings.

40
Non-negative
Matrix Factorization
⚫ ‘Orthogonal NMF’ == ‘Kernel K-Means Clustering’
Orthogonal NMF

is equivalent to K-means clustering. Where each row of can be viewed


as a probability distribution of the factors(clusters) .

Proof
1. Kernel K-means clustering tries to minimize

By utilizing an indicator matrix , where

, the above formulation can be transformed to

41
Non-negative
Matrix Factorization
⚫ ‘Orthogonal NMF’ == ‘Kernel K-Means Clustering’
2. We write the NMF formulation as

the zero gradient condition , given


then , the optimization can also be transformed
to

Further transform to

Which has the same form as Kernel K-means clustering

42
SVD for Rating Prediction
However,
➢ Some items are significantly higher rated…
➢ Some users rate substantially lower…
➢ All Ratings are high…
Thus,
➢ Add item offset…
➢ Add user offset…
➢ Add global offset…
⚫ Baseline (bias) (user & item deviation from
average)
⚫ Predict rating as

43
SVD for Rating Prediction
⚫ In order to prevent over-fitted problem, we add some regularized
terms, such as:

⚫ SVD++ asymmetric variation with implicit feedback

Where
are three item factor vectors
items rated by user u
items for which the user has given implicit preference

Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009.
44
SVD for Rating Prediction

Further
studies

Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009.
45
Restricted Boltzmann Machines
⚫ Each unit is a state that can be active or not active
⚫ Each input to a unit is associated to a weight
⚫ The transfer function calculates a score for every unit
based on the weighted sum of inputs
⚫ Score is passed to the activation function that calculates
the probability of the unit to be active

One pass

46
RBM for CF
⚫ Each visible unit = an item
⚫ Num of hidden units is a parameter
⚫ In training phase, for each user:

⚫ In prediction phase:

47
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. ICML, 2007
Probabilistic Matrix Factorization
⚫ From the view of probability to predict ratings, we assume
factorized vectors of users and items are in line with the
Gaussian distribution, user’s preference for items is a
combination of the probability of a series of problems, such as

where

48
Salakhutdinov R, Mnih A. Probabilistic matrix factorization. NIPS, 2008.
Probabilistic Matrix Factorization
⚫ By adding regularized terms, the formulation can be shown as:

Salakhutdinov R, Mnih A. Probabilistic matrix factorization. NIPS, 2008. 49


Probabilistic Matrix Factorization
⚫ In order to normalize the scores (i.e. 1-5), the paper uses the
following approach

⚫ Thus, the final formulation can be written as:

⚫ The implementation is adopted Gibbs Sampling Strategy

Salakhutdinov R, Mnih A. Probabilistic matrix factorization. NIPS, 2008.


50
Probabilistic Matrix Factorization
⚫ Once a PMF model has been fitted, users with very few ratings
will have feature vectors that are close to the prior mean so the
predicted ratings for those users will be close to the movie
average ratings.
⚫ Let be a latent similarity constraint matrix. We
define the feature vector for user i as

⚫ The corresponding Constrained PMF formulation can be shown


as:

Salakhutdinov R, Mnih A. Probabilistic matrix factorization. NIPS, 2008. 51


Probabilistic Matrix Factorization
⚫ Experiments
Very helpful for infrequent users

Salakhutdinov R, Mnih A. Probabilistic matrix factorization. NIPS, 2008. 52


Self-Representation Model
⚫ Beyond matrix factorization, there is another form of modeling
users’preference:
item
0 .2 0 1 .3
5 3 5 3
.1 0 .7 0 .4
4 1 4 1
𝐴5 1 𝐴5 1
.3 .8 0
0 0
𝑊
0 0
0 .1
.2
user 4 4 2
4 4 2
.6 0 .3 0 0
original matrix
weight matrix

Ning X, Karypis G. Slim: Sparse linear methods for top-n recommender systems. ICDM 2011
53
Clustering Based CF
⚫ Goal: cluster users and compute per-cluster “typical”
preferences
⚫ Users receive recommendations computed at the cluster
level

𝐶1 𝐶2

𝐶3 …

54
LSH for clustering
⚫ Method for grouping similar items in highly dimensional spaces
⚫ Find a hashing function s.t. similar items are grouped in the same
buckets
⚫ Main application is Nearest-neighbors
➢ Hashing function is found iteratively by concatenating random
hashing functions
➢ Addresses one of NN main concerns: performance

55
Classifiers for CF
⚫ Classifiers are general computational models trained
using positive and negative examples
⚫ They may take in inputs:
➢ Vector of item features (action / adventure)
➢ Preferences of customers (like action / adventure)…
➢ Relations among item
⚫ E.g. Logistic Regression, Bayesian Networks, Support Vector
Machines, Decision Trees, etc...
⚫ Pros:
➢ Versatile
➢ Can be combined with other methods to improve accuracy
⚫ Cons:
➢ Need a relevant training set
➢ May overfit
56
Limitations of CF

⚫ Cold Start: There needs to be enough other users already in the


system to find a match. New items need to get enough ratings.

⚫ Popularity Bias: Hard to recommend items to someone with


unique tastes. Tends to recommend popular items

57
Hybrid Approaches
⚫ Content–based recommendation with Bayesian classifier
⚫ Collaborative is standard using Pearson correlation:
⚫ Collaboration via content uses the content-based user profiles

58
Hybridization Methods

59
Social Recommendations
⚫ A social recommender system recommends items that are
“popular” in the social proximity of the user
⚫ Social proximity = trust (can also be topic-specific)
⚫ Given two individuals - the source (node A) and sink (node C) -
derive how much the source should trust the sink.
⚫ Algorithm: Advogato, Appleseed, MoleTrust, TidalTrust

60
Social Recommendations

users with similar


interests are more
likely to connect

61
Social Recommendations

62
Social Recommendations

63
Social Recommendations

⚫ Social connections can be used in combination with other


approaches

⚫ In particular, “friendships” can be fed into collaborative filtering


methods in different ways

➢ replace or modify user-user “similarity” by using social network


information
➢ use social connection as a part of the ML objective function as regularizer

64
Factorization Machines

⚫ Generalization of regularized matrix (and tensor) factorization


approaches combined with linear (or logistic) regression

⚫ Problem: Each new adaptation of matrix or tensor factorization


requires deriving new learning algorithms
➢ Hard to adapt to new domains and add data sources
➢ Hard to advance the learning algorithms across approaches
➢ Hard to incorporate non-categorical variables

Rendle S. Factorization machines with libFM. ACM TIST, 2012. 65


Factorization Machines

⚫ Approach: Treat input as a real-valued feature vector


➢ Model both linear and pair-wise interaction of k features (i.e. polynomial
regression)
➢ Traditional machine learning will overfit
➢ Factor pairwise interactions between features
➢ Reduced dimensionality of interactions promote generalization

⚫ Combines “generality of machine learning/regression with quality


of factorization models”

Rendle S. Factorization machines with libFM. ACM TIST, 2012. 66


Factorization Machines
⚫ Two categorical variables (u, i) encoded as real values:

⚫ FM becomes identical to MF with biases:

Rendle S. Factorization machines with libFM. ACM TIST, 2012. 67


Factorization Machines
⚫ Makes it easy to add a time signal

⚫ FM becomes as:

Rendle S. Factorization machines with libFM. ACM TIST, 2012. 68


Index
➢ What is recommender system?

➢ Traditional Methods

➢ Deep Learning base Methods

➢ RS Systems

➢ Conclusion

69
Deep Learning based RS

MLP: easily model the nonlinear interactions between users and items;
CNN: are capable of extracting local and global representations from
heterogeneous data sources such as textual and visual information;
RNN: enable the recommender systems to model the temporal
dynamics and sequential evolution of content information
70
Deep Learning based Recommender System: A Survey and New Perspectives. ACM Computing Surveys 2019
MLP based RS

Neural collaborative filtering framework Neural matrix factorization model


fusing GMF and MLP

Xiangnan He, etc. Neural Collaborative Filtering. WWW2017 71


AutoEncoder based RS

Item-based AutoRec model

where

Suvash Sedhain, etc. AutoRec: Autoencoders Meet Collaborative Filtering. WWW2015


72
CNN based RS
Music recommendation

Aaron Van den Oord, NIPS2013

Video recommendation

Joonseok Lee, KDD18


Deep content-based music recommendation. NIPS2013 73
Collaborative Deep Metric Learning for Video Understanding, KDD2018
RNN based RS
GRU4Rec:

Neural collaborative filtering framework

B. Hidasi. Session-based recommendations with recurrent neural networks. ICLR2016


74
Index
➢ What is recommender system?

➢ Traditional Methods

➢ Deep Learning base Methods

➢ RS Systems

➢ Conclusion

75
TOP 10 开源的推荐系统
⚫ SVDFeature http://svdfeature.apexlab.org/wiki/Main_Page
⚫ libMF http://www.csie.ntu.edu.tw/~cjlin/libmf/
⚫ libFM http://www.libfm.org/
⚫ Lenskit http://lenskit.grouplens.org/
⚫ GraphLab GraphLab - Collaborative Filtering
⚫ Mahout http://mahout.apache.org/
⚫ Myrrix http://myrrix.com/
⚫ EasyRec http://easyrec.org/
⚫ Waffles http://waffles.sourceforge.net/
⚫ RapidMiner http://rapidminer.com/

76
工业界的推荐系统
视频类:
Netflix:很多方法的融合
Hulu:主要是item based CF
Youtube:开始是random walk,后来改为类似item based CF的方法
图书类:
Amazon:好多方法都用了,主要是 item based CF
资讯类:
google news:用了CF和bayesian的方法。
digg:算法是 热门度+topic driven user based CF,

音乐类:
last.fm:用的是CF。
yahoo music:参考Koren的论文。
pandora:音乐基因项目,主要依赖专家标注。
社交类
facebook:算法叫Edgerank。
twitter:主要场景是推荐其它用户,参考官方介绍。
77
Widely used data
⚫ Movie
MovieLens http://grouplens.org/datasets/movielens/
Netflix https://www.netflix.com/cn/
⚫ Book
Amazon books http://www.amazon.com/b/ref=usbk_surl_books/?node=283155
Book-Crossing http://grouplens.org/datasets/book-crossing/
⚫ Music
Last.fm http://www.last.fm/
⚫ Food
Dianping http://www.dianping.com/
⚫ Else…
Epinion http://www.datatang.com/data/11849

78
Index
➢ What is recommender system?

➢ Traditional Methods

➢ Deep Learning base Methods

➢ RS Systems

➢ Conclusion

79
Challenging Problems
⚫ Data sparsity:
➢ Netflix Dataset: nearly 48,000 users and 1,700 items, only 1% observations

⚫ Curse of dimensionality
➢ Users’ features can be represented as many ways
⚫ Cold start:
➢ Many new users sign in and many new items are added
⚫ Personalization:
➢ Different user has different taste

80
Thanks!
Q&A

81

You might also like