Deep Contextual Attention for Human-Object Interaction Detection

Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

79 Citations (Scopus)
154 Downloads (Pure)

Abstract

This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. We formulate the approach as a neural information fusion framework. Our model assembles the information from three inference processes over the hierarchy: direct inference (directly predicting each part of a human body using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively. In addition, the fusion of multi-source information is conditioned on the inputs, i.e., by estimating and considering the confidence of the sources. The whole model is end-to-end differentiable, explicitly modeling information flows and structures. Our approach is extensively evaluated on four popular datasets, outperforming the state-of-the-arts in all cases, with a fast processing speed of 23fps. Our code and results have been released to help ease future research in this direction.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Computer Vision (ICCV2019)
PublisherIEEE
Pages5693-5701
Number of pages9
ISBN (Electronic)978-1-7281-4803-8
ISBN (Print)978-1-7281-4804-5
DOIs
Publication statusPublished - Feb 2020
MoE publication typeA4 Article in a conference publication
EventIEEE International Conference on Computer Vision - Seoul, Korea, Republic of
Duration: 27 Oct 20192 Nov 2019
http://iccv2019.thecvf.com/

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2019-October
ISSN (Electronic)1550-5499

Conference

ConferenceIEEE International Conference on Computer Vision
Abbreviated titleICCV
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/201902/11/2019
Internet address

Fingerprint

Dive into the research topics of 'Deep Contextual Attention for Human-Object Interaction Detection'. Together they form a unique fingerprint.

Cite this