Abstract
Reinforcement learning (RL) has been applied to many cooperative multi-agent systems recently. However, most of research have been carried on the systems with fixed quantity of agents. In reality, the quantity of agents in the system is often changed over time, and the majority of multi-agent reinforcement learning (MARL) models can’t work robustly on these systems. In this paper, we propose a model extended from actor-critic framework to process the systems with variable quantity of agents. To deal with the variable quantity issue, we design a feature extractor to embed variable length states. By employing bidirectional long short term memory (BLSTM) in actor network, which is capable of process variable length sequences, any number of agents can communicate and coordinate with each other. However, it is noted that the BLSTM is generally used to process sequences, so we use the critic network as an importance estimator for all agents and organize them into a sequence. Experiments show that our model works well in the variable quantity situation and outperform other models. Although our model may perform poorly when the quantity is too large, without changing hyper-parameters, it can be fine-tuned and achieve acceptable performance in a short time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, Y.: Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017)
Sukhbaatar, S., Fergus, R.: Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems (2016)
Mao, H., et al.: ACCNet: Actor-Coordinator-Critic Net for Learning-to-Communicate with Deep Multi-agent Reinforcement Learning. arXiv preprint arXiv:1706.03235 (2017)
Lowe, R., et al.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (2017)
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4), e0172395 (2017)
Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037 (2017)
Foerster, J., et al.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems (2016)
Foerster, J., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Foerster, J.N., Assael, Y.M., de Freitas, N., et al.: Learning to communicate to solve riddles with deep distributed recurrent q-networks. arXiv preprint arXiv:1602.02672 (2016)
Peng, P., Wen, Y., Yang, Y., et al.: Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games. arXiv preprint arXiv:1703.10069 (2017)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kim, Y.: Convolutional neural networks for sentence classification. Eprint Arxiv (2014)
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems (2000)
Vinyals, O., Ewalds, T., Bartunov, S., et al.: Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Wang, G., Shi, J. (2019). Actor-Critic for Multi-agent System with Variable Quantity of Agents. In: Li, B., Yang, M., Yuan, H., Yan, Z. (eds) IoT as a Service. IoTaaS 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 271. Springer, Cham. https://doi.org/10.1007/978-3-030-14657-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-14657-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14656-6
Online ISBN: 978-3-030-14657-3
eBook Packages: Computer ScienceComputer Science (R0)