Reinforcement Learning Based Underwater Wireless Optical Communication Alignment for Autonomous Underwater Vehicles

Yang Weng, Joni Pajarinen, Riad Akrour, Takumi Matsuda, Jan Peters, Toshihiro Maki

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)


With the developments in underwater wireless optical communication (UWOC) technology, UWOC can be used in conjunction with autonomous underwater vehicles (AUVs) for high-speed data sharing among the vehicle formation during underwater exploration. A beam alignment problem arises during communication due to the transmission range, external disturbances and noise, and uncertainties in the AUV dynamic model. In this article, we propose an acoustic navigation method to guide the alignment process without requiring beam directors, light intensity sensors, and/or scanning algorithms as used in previous research. The AUVs need stably maintain a specific relative position and orientation for establishing an optical link. We model the alignment problem as a partially observable Markov decision process (POMDP) that takes manipulation, navigation, and energy consumption of underwater vehicles into account. However, finding an efficient policy for the POMDP under high partial observability and environmental variability is challenging. Therefore, for successful policy optimization, we utilize the soft actor–critic reinforcement learning algorithm together with AUV-specific belief updates and reward shaping based curriculum learning. Our approach outperformed baseline approaches in a simulation environment and successfully performed the beam alignment process from one AUV to another on the real AUV Tri-TON 2.

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalIEEE Journal of Oceanic Engineering
Early online date19 Jul 2022
Publication statusE-pub ahead of print - 19 Jul 2022
MoE publication typeA1 Journal article-refereed


  • Acoustic beams
  • Acoustics
  • Adaptive optics
  • Autonomous underwater vehicle (AUV)
  • Optical beams
  • partially observable Markov decision process (POMDP)
  • Photonics
  • reinforcement learning (RL)
  • Scattering
  • soft actor–critic (SAC)
  • Task analysis
  • underwater wireless optical communication (UWOC)


Dive into the research topics of 'Reinforcement Learning Based Underwater Wireless Optical Communication Alignment for Autonomous Underwater Vehicles'. Together they form a unique fingerprint.

Cite this