TY - JOUR
T1 - Guided-attention and gated-aggregation network for medical image segmentation
AU - Fiaz, Mustansar
AU - Noman, Mubashir
AU - Cholakkal, Hisham
AU - Anwer, Rao Muhammad
AU - Hanna, Jacob
AU - Khan, Fahad Shahbaz
N1 - Publisher Copyright: © 2024 Elsevier Ltd
PY - 2024/12
Y1 - 2024/12
N2 - Recently, transformers have been widely used in medical image segmentation to capture long-range and global dependencies using self-attention. However, they often struggle to learn the local details which limit their ability to capture irregular shapes and sizes of the tissues and indistinct boundaries between the tissues, which are critical for accurate segmentation. To alleviate this issue, we propose a network named GA2Net, which comprises an encoder, a bottleneck, and a decoder. The encoder computes multi-scale features. In the bottleneck, we propose a hierarchical-gated features aggregation (HGFA) which introduces a novel spatial gating mechanism to enrich the multi-scale features. To effectively learn the shapes and sizes of the tissues, we apply deep supervision in the bottleneck. GA2Net proposes to use adaptive aggregation (AA) within the decoder, to adjust the receptive fields for each location in the feature map, by replacing the traditional concatenation/summation operations in skip connections in U-Net like architecture. Furthermore, we propose mask-guided feature attention (MGFA) modules within the decoder which strives to learn the salient features using foreground priors to adequately grasp the intricate structural and contour information of the tissues. We also apply intermediate supervision for each stage of the decoder, which further improves the capability of the model to better locate the boundaries of the tissues. Our extensive experimental results illustrate that our GA2-Net significantly outperforms the existing state-of-the-art methods over eight medical image segmentation datasets i.e., five polyps, a skin lesion, a multiple myeloma cell segmentation, and a cardiac MRI scan datasets. We then perform an extensive ablation study to validate the capabilities of our method. Code is available at https://github.com/mustansarfiaz/ga2net.
AB - Recently, transformers have been widely used in medical image segmentation to capture long-range and global dependencies using self-attention. However, they often struggle to learn the local details which limit their ability to capture irregular shapes and sizes of the tissues and indistinct boundaries between the tissues, which are critical for accurate segmentation. To alleviate this issue, we propose a network named GA2Net, which comprises an encoder, a bottleneck, and a decoder. The encoder computes multi-scale features. In the bottleneck, we propose a hierarchical-gated features aggregation (HGFA) which introduces a novel spatial gating mechanism to enrich the multi-scale features. To effectively learn the shapes and sizes of the tissues, we apply deep supervision in the bottleneck. GA2Net proposes to use adaptive aggregation (AA) within the decoder, to adjust the receptive fields for each location in the feature map, by replacing the traditional concatenation/summation operations in skip connections in U-Net like architecture. Furthermore, we propose mask-guided feature attention (MGFA) modules within the decoder which strives to learn the salient features using foreground priors to adequately grasp the intricate structural and contour information of the tissues. We also apply intermediate supervision for each stage of the decoder, which further improves the capability of the model to better locate the boundaries of the tissues. Our extensive experimental results illustrate that our GA2-Net significantly outperforms the existing state-of-the-art methods over eight medical image segmentation datasets i.e., five polyps, a skin lesion, a multiple myeloma cell segmentation, and a cardiac MRI scan datasets. We then perform an extensive ablation study to validate the capabilities of our method. Code is available at https://github.com/mustansarfiaz/ga2net.
KW - Convolutional neural networks
KW - Deep supervision
KW - Mask-guided feature attention
KW - Medical image segmentation
KW - Multi-scale feature aggregation
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85200271374&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2024.110812
DO - 10.1016/j.patcog.2024.110812
M3 - Article
AN - SCOPUS:85200271374
SN - 0031-3203
VL - 156
SP - 1
EP - 13
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 110812
ER -