A partially observable multi-ship collision avoidance decision-making model based on deep reinforcement learning

Kangjie Zheng, Xinyu Zhang*, Chengbo Wang*, Mingyang Zhang, Hao Cui

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

29 Citations (Scopus)

Abstract

Unmanned ships have drawn widespread attention for their potential to enhance navigational safety, minimize human errors, and improve shipping efficiency. Nevertheless, the complexity and uncertainty of mixed obstacle environments present significant challenges to developing unmanned ships, particularly in collision avoidance decision-making. This paper proposes a new model using the Partially Observable Markov Decision Process (POMDP) to construct a collision avoidance decision-making model in mixed obstacle environments for autonomous ships, which can address the environment's complexity and uncertainty and improve decision accuracy. An image-state observation method is proposed as images can provide more accurate, rich, and reliable information. A dense reward function is designed to address the issue of sparse rewards in fitting the algorithm. The Proximal Policy Optimization (PPO) algorithm is utilized for model training. Based on this, a route guidance method called the PPO for POMDP with guidelines under dense reward (G-IPOMDP-PPO) is proposed, which can improve training efficiency. Simulations are conducted in various mixed obstacle environments and compared with conventional algorithms. The results show that the proposed model can safely and efficiently make collision avoidance decisions in complex and uncertain environments. This research provides a new solution and theoretical foundation for developing autonomous ships and can be extended to achieving dynamic interactive collision avoidance in mixed obstacle environments.

Original languageEnglish
Article number106689
Number of pages13
JournalOCEAN AND COASTAL MANAGEMENT
Volume242
DOIs
Publication statusPublished - 1 Aug 2023
MoE publication typeA1 Journal article-refereed

Keywords

  • Collision avoidance decision-making
  • Dense reward
  • G-IPOMDP-PPO
  • Mixed obstacle environments
  • POMDP

Fingerprint

Dive into the research topics of 'A partially observable multi-ship collision avoidance decision-making model based on deep reinforcement learning'. Together they form a unique fingerprint.

Cite this