Projects per year
This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. We formulate the approach as a neural information fusion framework. Our model assembles the information from three inference processes over the hierarchy: direct inference (directly predicting each part of a human body using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively. In addition, the fusion of multi-source information is conditioned on the inputs, i.e., by estimating and considering the confidence of the sources. The whole model is end-to-end differentiable, explicitly modeling information flows and structures. Our approach is extensively evaluated on four popular datasets, outperforming the state-of-the-arts in all cases, with a fast processing speed of 23fps. Our code and results have been released to help ease future research in this direction.
|Title of host publication||Proceedings of the International Conference on Computer Vision (ICCV2019)|
|Number of pages||9|
|Publication status||Published - Feb 2020|
|MoE publication type||A4 Article in a conference publication|
|Event||IEEE International Conference on Computer Vision - Seoul, Korea, Republic of|
Duration: 27 Oct 2019 → 2 Nov 2019
|Name||Proceedings of the IEEE International Conference on Computer Vision|
|Conference||IEEE International Conference on Computer Vision|
|Country||Korea, Republic of|
|Period||27/10/2019 → 02/11/2019|
FingerprintDive into the research topics of 'Deep Contextual Attention for Human-Object Interaction Detection'. Together they form a unique fingerprint.
- 2 Finished