TY - JOUR
T1 - A partially observable multi-ship collision avoidance decision-making model based on deep reinforcement learning
AU - Zheng, Kangjie
AU - Zhang, Xinyu
AU - Wang, Chengbo
AU - Zhang, Mingyang
AU - Cui, Hao
N1 - Funding Information:
The work is supported by the Dalian Science and Technology Innovation Fund ( 2022JJ12GX015 ).
Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/8/1
Y1 - 2023/8/1
N2 - Unmanned ships have drawn widespread attention for their potential to enhance navigational safety, minimize human errors, and improve shipping efficiency. Nevertheless, the complexity and uncertainty of mixed obstacle environments present significant challenges to developing unmanned ships, particularly in collision avoidance decision-making. This paper proposes a new model using the Partially Observable Markov Decision Process (POMDP) to construct a collision avoidance decision-making model in mixed obstacle environments for autonomous ships, which can address the environment's complexity and uncertainty and improve decision accuracy. An image-state observation method is proposed as images can provide more accurate, rich, and reliable information. A dense reward function is designed to address the issue of sparse rewards in fitting the algorithm. The Proximal Policy Optimization (PPO) algorithm is utilized for model training. Based on this, a route guidance method called the PPO for POMDP with guidelines under dense reward (G-IPOMDP-PPO) is proposed, which can improve training efficiency. Simulations are conducted in various mixed obstacle environments and compared with conventional algorithms. The results show that the proposed model can safely and efficiently make collision avoidance decisions in complex and uncertain environments. This research provides a new solution and theoretical foundation for developing autonomous ships and can be extended to achieving dynamic interactive collision avoidance in mixed obstacle environments.
AB - Unmanned ships have drawn widespread attention for their potential to enhance navigational safety, minimize human errors, and improve shipping efficiency. Nevertheless, the complexity and uncertainty of mixed obstacle environments present significant challenges to developing unmanned ships, particularly in collision avoidance decision-making. This paper proposes a new model using the Partially Observable Markov Decision Process (POMDP) to construct a collision avoidance decision-making model in mixed obstacle environments for autonomous ships, which can address the environment's complexity and uncertainty and improve decision accuracy. An image-state observation method is proposed as images can provide more accurate, rich, and reliable information. A dense reward function is designed to address the issue of sparse rewards in fitting the algorithm. The Proximal Policy Optimization (PPO) algorithm is utilized for model training. Based on this, a route guidance method called the PPO for POMDP with guidelines under dense reward (G-IPOMDP-PPO) is proposed, which can improve training efficiency. Simulations are conducted in various mixed obstacle environments and compared with conventional algorithms. The results show that the proposed model can safely and efficiently make collision avoidance decisions in complex and uncertain environments. This research provides a new solution and theoretical foundation for developing autonomous ships and can be extended to achieving dynamic interactive collision avoidance in mixed obstacle environments.
KW - Collision avoidance decision-making
KW - Dense reward
KW - G-IPOMDP-PPO
KW - Mixed obstacle environments
KW - POMDP
UR - http://www.scopus.com/inward/record.url?scp=85163494137&partnerID=8YFLogxK
U2 - 10.1016/j.ocecoaman.2023.106689
DO - 10.1016/j.ocecoaman.2023.106689
M3 - Article
AN - SCOPUS:85163494137
SN - 0964-5691
VL - 242
JO - OCEAN AND COASTAL MANAGEMENT
JF - OCEAN AND COASTAL MANAGEMENT
M1 - 106689
ER -