Abstract
Current transformer-based change detection (CD) approaches either employ a pretrained model trained on a large-scale image classification ImageNet dataset or rely on first pretraining on another CD dataset and then fine-tuning on the target benchmark. This current strategy is driven by the fact that transformers typically require a large amount of training data to learn inductive biases, which is insufficient in standard CD datasets due to their small size. We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on five benchmarks. Instead of using conventional self-attention that struggles to capture inductive biases when trained from scratch, our architecture utilizes a shuffled sparse-attention operation that focuses on selected sparse informative regions to capture the inherent characteristics of the CD data. Moreover, we introduce a change-enhanced feature fusion (CEFF) module to fuse the features from input image pairs by performing a per-channel re-weighting. Our CEFF module aids in enhancing the relevant semantic changes while suppressing the noisy ones. Extensive experiments on five CD datasets reveal the merits of the proposed contributions, achieving gains as high as 1.35% in intersection over union (IoU) score, compared to the best-published results in the literature. The code is available at https://github.com/mustansarfiaz/ScratchFormer.
Original language | English |
---|---|
Article number | 4704214 |
Pages (from-to) | 1-14 |
Number of pages | 14 |
Journal | IEEE Transactions on Geoscience and Remote Sensing |
Volume | 62 |
DOIs | |
Publication status | Published - 4 Apr 2024 |
MoE publication type | A1 Journal article-refereed |
Keywords
- Change detection (CD)
- remote sensing
- transformers