MACRPO: Multi-agent cooperative recurrent policy optimization

Research output: Contribution to journalArticleScientificpeer-review

4 Downloads (Pure)

Abstract

This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic’s network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents’ rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.
Original languageEnglish
Article number1394209
Number of pages15
JournalFrontiers in Robotics and AI
Volume11
DOIs
Publication statusPublished - 2024
MoE publication typeA1 Journal article-refereed

Keywords

  • information sharing
  • multi-agent
  • interaction
  • cooperative
  • policy
  • reinforcement learning

Fingerprint

Dive into the research topics of 'MACRPO: Multi-agent cooperative recurrent policy optimization'. Together they form a unique fingerprint.

Cite this