Abstract
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Vinyals, O., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Mathieu. Lauriere. Numerical methods for mean field games and mean field type control. Mean Field Games 78 p221 (2021)
Huang, M., et al.: Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Commun. Inf. Syst. 6(3), 221–252 (2006)
Sonu, E., Chen, Y., Doshi, P.: Decision-theoretic planning under anonymity in agent populations. J. Artif. Intell. Res. 59, 725–770 (2017)
Perolat, J., et al.: Scaling up mean field games with online mirror descent. arXiv:2103.00623 (2021)
Laurière, M., et al.: Scalable deep reinforcement learning algorithms for mean field games. arXiv:2203.11973 (2022)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Lanctot, M., et al.: Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 (2019)
Subramanian, S.G., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. arXiv:2012.15791 (2020)
Angiuli, A., Fouque, J.-P., Laurière, M.: Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, pp. 1–55 (2022)
Subramanian, J., Mahajan, A.: Reinforcement learning in stationary mean-field games. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 251–259 (2019)
Mishra, R.K., Vasal, D., Vishwanath, S.: Model-free reinforcement learning for non-stationary mean field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1032–1037. IEEE (2020)
Cui, K., Koeppl, H.: Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: International Conference on Artificial Intelligence and Statistics, pp. 1909–1917. PMLR (2021)
Cardaliaguet, P., Hadikhanloo, S.: Learning in mean field games: the fictitious play. ESAIM: Control, Optimisation and Calculus of Variations, 23 (2017)
Perrin, S., Pérolat, J., Laurière, M., Geist, M., Elie, R., Pietquin, O.: Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33 (2020)
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)
Cacace, S., Camilli, F., Goffi, A.: A policy iteration method for mean field games. ESAIM: Control, Optimisation and Calculus of Variations, 27, p. 85 (2021)
Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)
Vieillard, N., Pietquin, O., Geist, M.: Munchausen reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 4235–4246 (2020)
Zaman, M.A.U., et al.: Oracle-free reinforcement learning in mean-field games along a single sample path. In: International Conference on Artificial Intelligence and Statistics, pp. 10178–10206. PMLR (2023)
Xie, Q., Yang, Z., Wang, Z., Minca, A.: Learning while playing in mean-field games: Convergence and optimality. In: International Conference on Machine Learning, pp. 11436–11447. PMLR (2021)
Fu, Z., Yang, Z., Chen, Y., Wang, Z.: Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv:1910.07498 (2019)
Angiuli, A., Fouque, J.-P., Lauriere, M.: Reinforcement learning for mean field games, with applications to economics. arXiv:2106.13755 (2021)
Aneeq uz Zaman, M., Zhang, K., Miehling, E., Basar, T.: Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 2278–2284. IEEE (2020)
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv:2103.01955 (2021)
Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)
Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte carlo sampling for regret minimization in extensive games. Advances in neural information processing systems, 22 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Algumaei, T., Solozabal, R., Alami, R., Hacid, H., Debbah, M., Takáč, M. (2023). Regularization of the Policy Updates for Stabilizing Mean Field Games. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13936. Springer, Cham. https://doi.org/10.1007/978-3-031-33377-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-33377-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33376-7
Online ISBN: 978-3-031-33377-4
eBook Packages: Computer ScienceComputer Science (R0)