Skip to main content

Regularization of the Policy Updates for Stabilizing Mean Field Games

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13936))

Included in the following conference series:

  • 1312 Accesses

Abstract

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 147.69
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code available at: https://github.com/Optimization-and-Machine-Learning-Lab/open_spiel/tree/master/open_spiel/python/mfg.

References

  1. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Google Scholar 

  2. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)

    Article  Google Scholar 

  3. Vinyals, O., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

    Google Scholar 

  4. Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)

    Google Scholar 

  5. Mathieu. Lauriere. Numerical methods for mean field games and mean field type control. Mean Field Games 78 p221 (2021)

    Google Scholar 

  6. Huang, M., et al.: Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle. Commun. Inf. Syst. 6(3), 221–252 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Sonu, E., Chen, Y., Doshi, P.: Decision-theoretic planning under anonymity in agent populations. J. Artif. Intell. Res. 59, 725–770 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Perolat, J., et al.: Scaling up mean field games with online mirror descent. arXiv:2103.00623 (2021)

  9. Laurière, M., et al.: Scalable deep reinforcement learning algorithms for mean field games. arXiv:2203.11973 (2022)

  10. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

  11. Lanctot, M., et al.: Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 (2019)

  12. Subramanian, S.G., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. arXiv:2012.15791 (2020)

  13. Angiuli, A., Fouque, J.-P., Laurière, M.: Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, pp. 1–55 (2022)

    Google Scholar 

  14. Subramanian, J., Mahajan, A.: Reinforcement learning in stationary mean-field games. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 251–259 (2019)

    Google Scholar 

  15. Mishra, R.K., Vasal, D., Vishwanath, S.: Model-free reinforcement learning for non-stationary mean field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1032–1037. IEEE (2020)

    Google Scholar 

  16. Cui, K., Koeppl, H.: Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: International Conference on Artificial Intelligence and Statistics, pp. 1909–1917. PMLR (2021)

    Google Scholar 

  17. Cardaliaguet, P., Hadikhanloo, S.: Learning in mean field games: the fictitious play. ESAIM: Control, Optimisation and Calculus of Variations, 23 (2017)

    Google Scholar 

  18. Perrin, S., Pérolat, J., Laurière, M., Geist, M., Elie, R., Pietquin, O.: Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33 (2020)

    Google Scholar 

  19. Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)

  20. Cacace, S., Camilli, F., Goffi, A.: A policy iteration method for mean field games. ESAIM: Control, Optimisation and Calculus of Variations, 27, p. 85 (2021)

    Google Scholar 

  21. Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)

    Google Scholar 

  22. Vieillard, N., Pietquin, O., Geist, M.: Munchausen reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 4235–4246 (2020)

    Google Scholar 

  23. Zaman, M.A.U., et al.: Oracle-free reinforcement learning in mean-field games along a single sample path. In: International Conference on Artificial Intelligence and Statistics, pp. 10178–10206. PMLR (2023)

    Google Scholar 

  24. Xie, Q., Yang, Z., Wang, Z., Minca, A.: Learning while playing in mean-field games: Convergence and optimality. In: International Conference on Machine Learning, pp. 11436–11447. PMLR (2021)

    Google Scholar 

  25. Fu, Z., Yang, Z., Chen, Y., Wang, Z.: Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv:1910.07498 (2019)

  26. Angiuli, A., Fouque, J.-P., Lauriere, M.: Reinforcement learning for mean field games, with applications to economics. arXiv:2106.13755 (2021)

  27. Aneeq uz Zaman, M., Zhang, K., Miehling, E., Basar, T.: Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 2278–2284. IEEE (2020)

    Google Scholar 

  28. Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv:2103.01955 (2021)

  29. Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)

    Article  Google Scholar 

  30. Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte carlo sampling for regret minimization in extensive games. Advances in neural information processing systems, 22 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Talal Algumaei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Algumaei, T., Solozabal, R., Alami, R., Hacid, H., Debbah, M., Takáč, M. (2023). Regularization of the Policy Updates for Stabilizing Mean Field Games. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13936. Springer, Cham. https://doi.org/10.1007/978-3-031-33377-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33377-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33376-7

  • Online ISBN: 978-3-031-33377-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics