Heterogeneous non-local fusion for multimodal activity recognition

Petr Byvshev, Pascal Mettes, Yu Xiao

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In this work, we investigate activity recognition using multimodal inputs from heterogeneous sensors. Activity recognition is commonly tackled from a single-modal perspective using videos. In case multiple signals are used, they come from the same homogeneous modality, e.g. in the case of color and optical flow. Here, we propose an activity network that fuses multimodal inputs coming from completely different and heterogeneous sensors. We frame such a heterogeneous fusion as a non-local operation. The observation is that in a non-local operation, only the channel dimensions need to match. In the network, heterogeneous inputs are fused, while maintaining the shapes and dimensionalities that fit each input. We outline both asymmetric fusion, where one modality serves to enforce the other, and symmetric fusion variants. To further promote research into multimodal activity recognition, we introduce GloVid, a first-person activity dataset captured with video recordings and smart glove sensor readings. Experiments on GloVid show the potential of heterogeneous non-local fusion for activity recognition, outperforming individual modalities and standard fusion techniques.

Original languageEnglish
Title of host publicationICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval
PublisherACM
Pages63-72
Number of pages10
ISBN (Electronic)9781450370875
DOIs
Publication statusPublished - 8 Jun 2020
MoE publication typeA4 Article in a conference publication
EventACM International Conference on Multimedia Retrieval - Dublin, Ireland
Duration: 8 Jun 202011 Jun 2020
Conference number: 10

Conference

ConferenceACM International Conference on Multimedia Retrieval
Abbreviated titleICMR
CountryIreland
CityDublin
Period08/06/202011/06/2020

Keywords

  • Activity recognition
  • Datasets
  • Heterogenous modalities

Fingerprint Dive into the research topics of 'Heterogeneous non-local fusion for multimodal activity recognition'. Together they form a unique fingerprint.

  • Projects

    CEAMA: Cognitive Engine for Assembly and Maintenance Automation

    Xiao, Y., Lee, J., Byvshev, P., Pham, T., Nyman, P., Souza Leite, C., Pouta, E. & Wirtanen, S.

    01/08/201830/06/2020

    Project: Business Finland: New business from research ideas (TUTLI)

    Cite this

    Byvshev, P., Mettes, P., & Xiao, Y. (2020). Heterogeneous non-local fusion for multimodal activity recognition. In ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 63-72). ACM. https://doi.org/10.1145/3372278.3390675