Multi-Agent Cooperation Q-Learning Algorithm Based on Constrained Markov Game

Yangyang Ge¹, Fei Zhu^{1, 2}, Wei Huang¹, Peiyao Zhao¹ and Quan Liu¹

School of Computer Science and Technology, Soochow University
Suzhou Jiangsu 215006, China
20184227043@stu.suda.edu.cn, {zhufei, huangwei}@suda.edu.cn, 20195427013@stu.suda.edu.cn, quanliu@suda.edu.cn
Provincial Key Laboratory for Computer Information Processing Technology
Soochow University, Suzhou 215006, China

Abstract

Multi-Agent system has broad application in real world, whose security performance, however, is barely considered. Reinforcement learning is one of the most important methods to resolve Multi-Agent problems. At present, certain progress has been made in applying Multi-Agent reinforcement learning to robot system, man-machine match, and automatic, etc. However, in the above area, an agent may fall into unsafe states where the agent may find it difficult to bypass obstacles, to receive information from other agents and so on. Ensuring the safety of Multi-Agent system is of great importance in the above areas where an agent may fall into dangerous states that are irreversible, causing great damage. To solve the safety problem, in this paper we introduce a Multi-Agent Cooperation Q-Learning Algorithm based on Constrained Markov Game. In this method, safety constraints are added to the set of actions, and each agent, when interacting with the environment to search for optimal values, should be restricted by the safety rules, so as to obtain an optimal policy that satisfies the security requirements. Since traditional Multi-Agent reinforcement learning algorithm is no more suitable for the proposed model in this paper, a new solution is introduced for calculating the global optimum state-action function that satisfies the safety constraints. We take advantage of the Lagrange multiplier method to determine the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, under conditions that the state-action function and the constraint function are both differentiable, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

Key words

Markov game, Distributed perception, Multi-Agent cooperation, constrained Markov decision process

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS191220009G

Publication information

Volume 17, Issue 2 (June 2020)
Year of Publication: 2020
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

Download Available in PDF
Portable Document Format

How to cite

Ge, Y., Zhu, F., Huang, W., Zhao, P., Liu, Q.: Multi-Agent Cooperation Q-Learning Algorithm Based on Constrained Markov Game. Computer Science and Information Systems, Vol. 17, No. 2, 647–664. (2020), https://doi.org/10.2298/CSIS191220009G