skip to main content
10.1145/3640457.3691698acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
demonstration

Multi-Preview Recommendation via Reinforcement Learning

Published: 08 October 2024 Publication History

Abstract

Preview recommendations serve as a crucial shortcut for attracting users’ attention on various systems, platforms, and webpages, significantly boosting user engagement. However, the variability of preview types and the flexibility of preview duration make it challenging to use an integrated framework for multi-preview recommendations under resource constraints. In this paper, we present an approach that incorporates constrained Q-learning into a notification recommendation system, effectively handling both multi-preview ranking and duration orchestration by targeting long-term user retention. Our method bridges the gap between combinatorial reinforcement learning, which often remains too theoretical for practical use, and segmented modules in production, where model performance is typically compromised due to over-simplification. We demonstrate the superiority of our approach through off-policy evaluation and online A/B testing using Microsoft data.

References

[1]
Eitan Altman. 2021. Constrained Markov decision processes. Routledge.
[2]
Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).
[3]
Steven Bohez, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess, and Raia Hadsell. 2019. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623 (2019).
[4]
Arthur Delarue, Ross Anderson, and Christian Tjandraatmadja. 2020. Reinforcement learning with combinatorial actions: An application to vehicle routing. Advances in Neural Information Processing Systems 33 (2020), 609–620.
[5]
Arnob Ghosh, Xingyu Zhou, and Ness Shroff. 2022. Provably efficient model-free constrained rl with linear function approximation. Advances in Neural Information Processing Systems 35 (2022), 13303–13315.
[6]
Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvari, and Mengdi Wang. 2021. Bootstrapping fitted q-evaluation for off-policy inference. In International Conference on Machine Learning. PMLR, 4074–4084.
[7]
Alexandre Laterre, Yunguan Fu, Mohamed Khalil Jabri, Alain-Sam Cohen, David Kas, Karl Hajjar, Torbjorn S Dahl, Amine Kerkeni, and Karim Beguir. 2018. Ranked reward: Enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint arXiv:1807.01672 (2018).
[8]
Hoang Le, Cameron Voloshin, and Yisong Yue. 2019. Batch policy learning under constraints. In International Conference on Machine Learning. PMLR, 3703–3712.
[9]
Yongshuai Liu, Avishai Halev, and Xin Liu. 2021. Policy learning with constraints in model-free reinforcement learning: A survey. In The 30th international joint conference on artificial intelligence (ijcai).
[10]
Nina Mazyavkina, Sergey Sviridov, Sergei Ivanov, and Evgeny Burnaev. 2021. Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research 134 (2021), 105400.
[11]
Akifumi Wachi and Yanan Sui. 2020. Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning. PMLR, 9797–9806.
[12]
Yunhao Yang and Andrew Whinston. 2023. A survey on reinforcement learning for combinatorial optimization. In 2023 IEEE World Conference on Applied Intelligence and Computing (AIC). IEEE, 131–136.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems
October 2024
1438 pages
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Check for updates

Author Tags

  1. Combinatorial Reinforcement Learning
  2. Constrained Q-learning
  3. Duration
  4. Multi-Preview Ranking
  5. Notification Recommendation System

Qualifiers

  • Demonstration
  • Research
  • Refereed limited

Funding Sources

Conference

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 105
    Total Downloads
  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)5
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media