research-article

Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System

Authors:

Ke WangAuthors Info & Claims

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2289 - 2298

https://doi.org/10.1145/3219819.3220021

Published: 19 July 2018 Publication History

Abstract

We propose a novel way to train ranking models, such as recommender systems, that are both effective and efficient. Knowledge distillation (KD) was shown to be successful in image recognition to achieve both effectiveness and efficiency. We propose a KD technique for learning to rank problems, called ranking distillation (RD). Specifically, we train a smaller student model to learn to rank documents/items from both the training data and the supervision of a larger teacher model. The student model achieves a similar ranking performance to that of the large teacher model, but its smaller model size makes the online inference more efficient. RD is flexible because it is orthogonal to the choices of ranking models for the teacher and student. We address the challenges of RD for ranking problems. The experiments on public data sets and state-of-the-art recommendation models showed that RD achieves its design purposes: the student model learnt with RD has less than an half size of the teacher model while achieving a ranking performance similar tothe teacher model and much better than the student model learnt without RD.

Supplementary Material

MP4 File (tang_compact_ranking_models.mp4)

Download
559.77 MB

References

[1]

Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormándi, George E. Dahl, and Geoffrey E. Hinton . 2018. Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235 (2018).

[2]

Nima Asadi, Donald Metzler, Tamer Elsayed, and Jimmy Lin . 2011. Pseudo test collections for learning web search ranking functions International Conference on Research and development in Information Retrieval. ACM, 1073--1082.

Digital Library

[3]

Jimmy Ba and Rich Caruana . 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654--2662.

Digital Library

[4]

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien . 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006){book reviews}. IEEE Transactions on Neural Networks Vol. 20, 3 (2009), 542--542.

Digital Library

[5]

Xu Chen, Zheng Qin, Yongfeng Zhang, and Tao Xu . 2016. Learning to rank features for recommendation over multiple categories International ACM SIGIR conference on Research and Development in Information Retrieval. 305--314.

Digital Library

[6]

Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang . 2017. DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer. arXiv preprint arXiv:1707.01220 (2017).

[7]

Eunjoon Cho, Seth A Myers, and Jure Leskovec . 2011. Friendship and mobility: user movement in location-based social networks International Conference on Knowledge Discovery and Data Mining. ACM, 1082--1090.

Digital Library

[8]

Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun . 2015. The loss surfaces of multilayer networks. In Artificial Intelligence and Statistics. 192--204.

[9]

Paul Covington, Jay Adams, and Emre Sargin . 2016. Deep neural networks for youtube recommendations. ACM Conference on Recommender systems. 191--198.

Digital Library

[10]

Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft . 2017. Neural Ranking Models with Weak Supervision. arXiv preprint arXiv:1704.08803 (2017).

Digital Library

[11]

Fernando Diaz . 2016. Learning to Rank with Labeled Features. In International Conference on the Theory of Information Retrieval. ACM, 41--44.

Digital Library

[12]

Ignacio Fernández-Tob'ıas, Iván Cantador, Marius Kaminskas, and Francesco Ricci . 2012. Cross-domain recommender systems: A survey of the state of the art Spanish Conference on Information Retrieval. sn, 24.

[13]

Ruining He and Julian McAuley . 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation International Conference on Data Mining. IEEE.

[14]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua . 2017. Neural collaborative filtering. In International Conference on World Wide Web. ACM, 173--182.

Digital Library

[15]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean . 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[16]

Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin . 2017. Collaborative metric learning. In International Conference on World Wide Web. 193--201.

Digital Library

[17]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei . 2014. Large-scale video classification with convolutional neural networks IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.

Digital Library

[18]

Kenji Kawaguchi . 2016. Deep learning without poor local minima. In Advances in Neural Information Processing Systems. 586--594.

Digital Library

[19]

Yoon Kim . 2014. Convolutional Neural Networks for Sentence Classification Conference on Empirical Methods on Natural Language Processing. ACL, 1756--1751.

[20]

Yoon Kim and Alexander M Rush . 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016).

[21]

Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix factorization techniques for recommender systems. Computer, Vol. 42, 8 (2009).

Digital Library

[22]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[23]

Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis . 2017. FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems International Conference on Management of Data. ACM, 835--850.

Digital Library

[24]

David C Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing . 2017. Related pins at pinterest: The evolution of a real-world recommender system International Conference on World Wide Web. 583--592.

Digital Library

[25]

Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur . 2010. Recurrent neural network based language model. In Interspeech.

[26]

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari . 2013. Learning with noisy labels. In Advances in neural information processing systems. 1196--1204.

Digital Library

[27]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng . 2016. Text Matching As Image Recognition. In AAAI Conference on Artificial Intelligence. AAAI Press, 2793--2799.

Digital Library

[28]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng . 2017. DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval. arXiv preprint arXiv:1710.05649 (2017).

Digital Library

[29]

Steffen Rendle and Christoph Freudenthaler . 2014. Improving pairwise learning for item recommendation from implicit feedback International Conference on Web Search and Data Mining. ACM, 273--282.

Digital Library

[30]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme . 2009. BPR: Bayesian personalized ranking from implicit feedback Conference on Uncertainty in Artificial Intelligence. AUAI Press, 452--461.

Digital Library

[31]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio . 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).

[32]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl . 2001. Item-based collaborative filtering recommendation algorithms International Conference on World Wide Web. ACM, 285--295.

Digital Library

[33]

Jiaxi Tang and Ke Wang . 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding ACM International Conference on Web Search and Data Mining.

Digital Library

[34]

Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk . 2015. Lemp: Fast retrieval of large entries in a matrix product International Conference on Management of Data. ACM, 107--122.

Digital Library

[35]

Hao Wang, Naiyan Wang, and Dit-Yan Yeung . 2015. Collaborative deep learning for recommender systems International Conference on Knowledge Discovery and Data Mining. ACM, 1235--1244.

Digital Library

[36]

Jason Weston, Samy Bengio, and Nicolas Usunier . 2010. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, Vol. 81, 1 (2010), 21--35.

Digital Library

[37]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power . 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling International ACM SIGIR conference on Research and Development in Information Retrieval. 55--64.

Digital Library

[38]

Quan Yuan, Gao Cong, and Aixin Sun . 2014. Graph-based point-of-interest recommendation with geographical and temporal influences International Conference on Information and Knowledge Management. ACM, 659--668.

Digital Library

[39]

Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and Tat-Seng Chua . 2016. Discrete collaborative filtering. In International Conference on Research and Development in Information Retrieval. ACM, 325--334.

Digital Library

[40]

Yan Zhang, Defu Lian, and Guowu Yang . 2017. Discrete Personalized Ranking for Fast Collaborative Filtering from Implicit Feedback. AAAI Conference on Artificial Intelligence. AAAI Press, 1669--1675.

[41]

Zhiwei Zhang, Qifan Wang, Lingyun Ruan, and Luo Si . 2014. Preference preserving hashing for efficient recommendation International Conference on Research and Development in Information Retrieval. ACM, 183--192.

Digital Library

[42]

Ke Zhou and Hongyuan Zha . 2012. Learning binary codes for collaborative filtering. International Conference on Knowledge Discovery and Data Mining. ACM, 498--506.

Digital Library

[43]

Xiaojin Zhu . 2005. Semi-supervised learning literature survey. (2005).

Cited By

Bai TChen WYang CShi C(2025)Invariant debiasing learning for recommendation via biased imputationInformation Processing & Management10.1016/j.ipm.2024.10402862:3(104028)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104028
Chen CZhang YLi YWang JQi LXu XZheng XYin J(2024)Post-Training Attribute Unlearning in Recommender SystemsACM Transactions on Information Systems10.1145/370198743:1(1-28)Online publication date: 6-Nov-2024
https://dl.acm.org/doi/10.1145/3701987
Oh SUstun BMcauley JKumar S(2024)FINEST: Stabilizing Recommendations by Rank-Preserving Fine-TuningACM Transactions on Knowledge Discovery from Data10.1145/369525618:9(1-22)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1145/3695256
Show More Cited By

Index Terms

Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval efficiency
    2. Retrieval models and ranking

Recommendations

DE-RRD: A Knowledge Distillation Framework for Recommender System
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Recent recommender systems have started to employ knowledge distillation, which is a model compression technique distilling knowledge from a cumbersome model (teacher) to a compact model (student), to reduce inference latency while maintaining ...
Item-side ranking regularized distillation for recommender system
Abstract
Recent recommender system (RS) have adopted large and sophisticated model architecture to better understand the complex user-item relationships, and accordingly, the size of the recommender is continuously increasing. To reduce the ...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

July 2018

2925 pages

ISBN:9781450355520

DOI:10.1145/3219819

General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences nad Engineering Research Council of Canada

Conference

KDD '18

Sponsor:

KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 19 - 23, 2018

London, United Kingdom

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
2,020
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)8

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai TChen WYang CShi C(2025)Invariant debiasing learning for recommendation via biased imputationInformation Processing & Management10.1016/j.ipm.2024.10402862:3(104028)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104028
Chen CZhang YLi YWang JQi LXu XZheng XYin J(2024)Post-Training Attribute Unlearning in Recommender SystemsACM Transactions on Information Systems10.1145/370198743:1(1-28)Online publication date: 6-Nov-2024
https://dl.acm.org/doi/10.1145/3701987
Oh SUstun BMcauley JKumar S(2024)FINEST: Stabilizing Recommendations by Rank-Preserving Fine-TuningACM Transactions on Knowledge Discovery from Data10.1145/369525618:9(1-22)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1145/3695256
Anand VMaurya A(2024)A Survey on Recommender Systems Using Graph Neural NetworkACM Transactions on Information Systems10.1145/369478443:1(1-49)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3694784
Cai QCao JXu GZhu N(2024)Distributed Recommendation Systems: Survey and Research DirectionsACM Transactions on Information Systems10.1145/369478343:1(1-38)Online publication date: 6-Sep-2024
https://dl.acm.org/doi/10.1145/3694783
Keshvari SSaeedi FSadoghi Yazdi HEnsan F(2024)A Self-Distilled Learning to Rank Model for Ad Hoc RetrievalACM Transactions on Information Systems10.1145/368178442:6(1-28)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3681784
Lin JLi QXie GGuan ZJiang YXu TZhang ZZhao PCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680615(7581-7590)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680615
Khandel PYates AVarbanescu Ade Rijke MPimentel AOosterhuis HBast HXiong C(2024)Distillation vs. Sampling for Efficient Training of Learning to Rank ModelsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672527(51-60)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672527
Kang SKweon WLee DLian JXie XYu H(2024)Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender SystemsACM Transactions on Recommender Systems10.1145/3649443Online publication date: 23-Feb-2024
https://doi.org/10.1145/3649443
Cui YLiu FWang PWang BTang HWan YWang JChen J(2024)Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language ModelsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688118(507-517)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688118
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten