Regularization of the Policy Updates for Stabilizing Mean Field Games

Algumaei, Talal; Solozabal, Ruben; Alami, Reda; Hacid, Hakim; Debbah, Merouane; Takáč, Martin

doi:10.1007/978-3-031-33377-4_28

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13936))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1312 Accesses

Abstract

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 117.69; Price includes VAT (France)

Softcover Book: EUR 147.69; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Actor-Critic Reinforcement Learning Algorithms for Mean Field Games in Continuous Time, State and Action Spaces

Article 20 May 2024

Large-Scale Multi-agent Reinforcement Learning Based on Weighted Mean Field

Unified reinforcement Q-learning for mean field game and control problems

Article 15 January 2022

Notes

1.
Code available at: https://github.com/Optimization-and-Machine-Learning-Lab/open_spiel/tree/master/open_spiel/python/mfg.

References

Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Google Scholar
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Article Google Scholar
Vinyals, O., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Google Scholar
Mathieu. Lauriere. Numerical methods for mean field games and mean field type control. Mean Field Games 78 p221 (2021)
Google Scholar
Huang, M., et al.: Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Commun. Inf. Syst. 6(3), 221–252 (2006)
Article MathSciNet MATH Google Scholar
Sonu, E., Chen, Y., Doshi, P.: Decision-theoretic planning under anonymity in agent populations. J. Artif. Intell. Res. 59, 725–770 (2017)
Article MathSciNet MATH Google Scholar
Perolat, J., et al.: Scaling up mean field games with online mirror descent. arXiv:2103.00623 (2021)
Laurière, M., et al.: Scalable deep reinforcement learning algorithms for mean field games. arXiv:2203.11973 (2022)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Lanctot, M., et al.: Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 (2019)
Subramanian, S.G., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. arXiv:2012.15791 (2020)
Angiuli, A., Fouque, J.-P., Laurière, M.: Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, pp. 1–55 (2022)
Google Scholar
Subramanian, J., Mahajan, A.: Reinforcement learning in stationary mean-field games. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 251–259 (2019)
Google Scholar
Mishra, R.K., Vasal, D., Vishwanath, S.: Model-free reinforcement learning for non-stationary mean field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1032–1037. IEEE (2020)
Google Scholar
Cui, K., Koeppl, H.: Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: International Conference on Artificial Intelligence and Statistics, pp. 1909–1917. PMLR (2021)
Google Scholar
Cardaliaguet, P., Hadikhanloo, S.: Learning in mean field games: the fictitious play. ESAIM: Control, Optimisation and Calculus of Variations, 23 (2017)
Google Scholar
Perrin, S., Pérolat, J., Laurière, M., Geist, M., Elie, R., Pietquin, O.: Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33 (2020)
Google Scholar
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)
Cacace, S., Camilli, F., Goffi, A.: A policy iteration method for mean field games. ESAIM: Control, Optimisation and Calculus of Variations, 27, p. 85 (2021)
Google Scholar
Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)
Google Scholar
Vieillard, N., Pietquin, O., Geist, M.: Munchausen reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 4235–4246 (2020)
Google Scholar
Zaman, M.A.U., et al.: Oracle-free reinforcement learning in mean-field games along a single sample path. In: International Conference on Artificial Intelligence and Statistics, pp. 10178–10206. PMLR (2023)
Google Scholar
Xie, Q., Yang, Z., Wang, Z., Minca, A.: Learning while playing in mean-field games: Convergence and optimality. In: International Conference on Machine Learning, pp. 11436–11447. PMLR (2021)
Google Scholar
Fu, Z., Yang, Z., Chen, Y., Wang, Z.: Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv:1910.07498 (2019)
Angiuli, A., Fouque, J.-P., Lauriere, M.: Reinforcement learning for mean field games, with applications to economics. arXiv:2106.13755 (2021)
Aneeq uz Zaman, M., Zhang, K., Miehling, E., Basar, T.: Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 2278–2284. IEEE (2020)
Google Scholar
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv:2103.01955 (2021)
Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)
Article Google Scholar
Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte carlo sampling for regret minimization in extensive games. Advances in neural information processing systems, 22 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Mohamed bin Zayed University of Artificial Intelligence, Masdar City, UAE
Talal Algumaei, Ruben Solozabal & Martin Takáč
Technology Innovation Institute, Masdar City, UAE
Reda Alami, Hakim Hacid & Merouane Debbah

Authors

Talal Algumaei
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Solozabal
View author publications
You can also search for this author in PubMed Google Scholar
Reda Alami
View author publications
You can also search for this author in PubMed Google Scholar
Hakim Hacid
View author publications
You can also search for this author in PubMed Google Scholar
Merouane Debbah
View author publications
You can also search for this author in PubMed Google Scholar
Martin Takáč
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Talal Algumaei .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Hisashi Kashima
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Tsuyoshi Ide
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Algumaei, T., Solozabal, R., Alami, R., Hacid, H., Debbah, M., Takáč, M. (2023). Regularization of the Policy Updates for Stabilizing Mean Field Games. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13936. Springer, Cham. https://doi.org/10.1007/978-3-031-33377-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-33377-4_28
Published: 28 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33376-7
Online ISBN: 978-3-031-33377-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Regularization of the Policy Updates for Stabilizing Mean Field Games

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Actor-Critic Reinforcement Learning Algorithms for Mean Field Games in Continuous Time, State and Action Spaces

Large-Scale Multi-agent Reinforcement Learning Based on Weighted Mean Field

Unified reinforcement Q-learning for mean field game and control problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Regularization of the Policy Updates for Stabilizing Mean Field Games

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Actor-Critic Reinforcement Learning Algorithms for Mean Field Games in Continuous Time, State and Action Spaces

Large-Scale Multi-agent Reinforcement Learning Based on Weighted Mean Field

Unified reinforcement Q-learning for mean field game and control problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation