MACRPO: Multi-agent cooperative recurrent policy optimization

Research output: Contribution to journalArticleScientificpeer-review

4 Downloads (Pure)


This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic’s network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents’ rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at
Original languageEnglish
Article number1394209
Number of pages15
JournalFrontiers in Robotics and AI
Publication statusPublished - 2024
MoE publication typeA1 Journal article-refereed


  • information sharing
  • multi-agent
  • interaction
  • cooperative
  • policy
  • reinforcement learning


Dive into the research topics of 'MACRPO: Multi-agent cooperative recurrent policy optimization'. Together they form a unique fingerprint.

Cite this