Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection

Jin Xie, Rao Anwer, Hisham Cholakkal, Jing Nie, Jiale Cao, Jorma Laaksonen, Fahad Shahbaz Khan

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu


Multispectral pedestrian detection that enables continuous (day and night) localization of pedestrians has numerous applications. Existing approaches typically aggregate multispectral features by a simple element-wise operation. However, such a local feature aggregation scheme ignores the rich non-local contextual information. Further, we argue that a local tight correspondence across modalities is desired for multi-modal feature aggregation. To address these issues, we introduce a multispectral pedestrian detection framework that comprises a novel dynamic cross-modal network (DCMNet), which strives to adaptively utilize the local and non-local complementary information between multi-modal features. The proposed DCMNet consists of a local and a non-local feature aggregation module. The local module employs dynamically learned convolutions to capture local relevant information across modalities. On the other hand, the non-local module captures non-local cross-modal information by first projecting features from both modalities into the latent space and then obtaining dynamic latent feature nodes for feature aggregation. Comprehensive experiments are performed on two challenging benchmarks: KAIST and LLVIP. Experiments reveal the benefits of the proposed DCMNet, leading to consistently improved detection performance on diverse detection paradigms and backbones. When using the same backbone, our proposed detector achieves absolute gains of 1.74% and 1.90% over the baseline Cascade RCNN on the KAIST and LLVIP datasets.
OtsikkoMM '22: Proceedings of the 30th ACM International Conference on Multimedia
ISBN (elektroninen)978-1-4503-9203-7
DOI - pysyväislinkit
TilaJulkaistu - 10 lokak. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaACM International Conference on Multimedia - Lisboa, Portugali
Kesto: 10 lokak. 202214 lokak. 2022
Konferenssinumero: 30


ConferenceACM International Conference on Multimedia


Sukella tutkimusaiheisiin 'Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä