Top-down deep appearance attention for action recognition

Rao Muhammad Anwer*, Fahad Shahbaz Khan, Joost van de Weijer, Jorma Laaksonen

*Tämän työn vastaava kirjoittaja

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

1 Sitaatiot (Scopus)


Recognizing human actions in videos is a challenging problem in computer vision. Recently, convolutional neural network based deep features have shown promising results for action recognition. In this paper, we investigate the problem of fusing deep appearance and motion cues for action recognition. We propose a video representation which combines deep appearance and motion based local convolutional features within the bag-of-deep-features framework. Firstly, dense deep appearance and motion based local convolutional features are extracted from spatial (RGB) and temporal (flow) networks, respectively. Both visual cues are processed in parallel by constructing separate visual vocabularies for appearance and motion. A category-specific appearance map is then learned to modulate the weights of the deep motion features. The proposed representation is discriminative and binds the deep local convolutional features to their spatial locations. Experiments are performed on two challenging datasets: JHMDB dataset with 21 action classes and ACT dataset with 43 categories. The results clearly demonstrate that our approach outperforms both standard approaches of early and late feature fusion. Further, our approach is only employing action labels and without exploiting body part information, but achieves competitive performance compared to the state-of-the-art deep features based approaches.

OtsikkoImage Analysis - 20th Scandinavian Conference, SCIA 2017, Proceedings
ISBN (painettu)9783319591254
DOI - pysyväislinkit
TilaJulkaistu - 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaScandinavian Conference on Image Analysis - Tromso, Norja
Kesto: 12 kesäk. 201714 kesäk. 2017
Konferenssinumero: 20


NimiLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vuosikerta10269 LNCS
ISSN (painettu)03029743
ISSN (elektroninen)16113349


ConferenceScandinavian Conference on Image Analysis


Sukella tutkimusaiheisiin 'Top-down deep appearance attention for action recognition'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä