Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection

Jin Xie, Rao Anwer, Hisham Cholakkal, Jing Nie, Jiale Cao, Jorma Laaksonen, Fahad Shahbaz Khan

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review


Multispectral pedestrian detection that enables continuous (day and night) localization of pedestrians has numerous applications. Existing approaches typically aggregate multispectral features by a simple element-wise operation. However, such a local feature aggregation scheme ignores the rich non-local contextual information. Further, we argue that a local tight correspondence across modalities is desired for multi-modal feature aggregation. To address these issues, we introduce a multispectral pedestrian detection framework that comprises a novel dynamic cross-modal network (DCMNet), which strives to adaptively utilize the local and non-local complementary information between multi-modal features. The proposed DCMNet consists of a local and a non-local feature aggregation module. The local module employs dynamically learned convolutions to capture local relevant information across modalities. On the other hand, the non-local module captures non-local cross-modal information by first projecting features from both modalities into the latent space and then obtaining dynamic latent feature nodes for feature aggregation. Comprehensive experiments are performed on two challenging benchmarks: KAIST and LLVIP. Experiments reveal the benefits of the proposed DCMNet, leading to consistently improved detection performance on diverse detection paradigms and backbones. When using the same backbone, our proposed detector achieves absolute gains of 1.74% and 1.90% over the baseline Cascade RCNN on the KAIST and LLVIP datasets.
Original languageEnglish
Title of host publicationMM '22: Proceedings of the 30th ACM International Conference on Multimedia
ISBN (Electronic)978-1-4503-9203-7
Publication statusPublished - 10 Oct 2022
MoE publication typeA4 Conference publication
EventACM International Conference on Multimedia - Lisboa, Portugal
Duration: 10 Oct 202214 Oct 2022
Conference number: 30


ConferenceACM International Conference on Multimedia
Abbreviated titleMM


Dive into the research topics of 'Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection'. Together they form a unique fingerprint.

Cite this