TY - JOUR
T1 - Scale coding bag of deep features for human attribute and action recognition
AU - Khan, Fahad Shahbaz
AU - van de Weijer, Joost
AU - Anwer, Rao Muhammad
AU - Bagdanov, Andrew D.
AU - Felsberg, Michael
AU - Laaksonen, Jorma
PY - 2017
Y1 - 2017
N2 - Most approaches to human attribute and action recognition in still
images are based on image representation in which multi-scale local
features are pooled across scale into a single, scale-invariant
encoding. Both in bag-of-words and the recently popular representations
based on convolutional neural networks, local features are computed at
multiple scales. However, these multi-scale convolutional features are
pooled into a single scale-invariant representation. We argue that
entirely scale-invariant image representations are sub-optimal and
investigate approaches to scale coding within a bag of deep features
framework. Our approach encodes multi-scale information explicitly
during the image encoding stage. We propose two strategies to encode
multi-scale information explicitly in the final image representation. We
validate our two scale coding techniques on five datasets: Willow,
PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes
(HAT-27). On all datasets, the proposed scale coding approaches
outperform both the scale-invariant method and the standard deep
features of the same network. Further, combining our scale coding
approaches with standard deep features leads to consistent improvement
over the state of the art.
AB - Most approaches to human attribute and action recognition in still
images are based on image representation in which multi-scale local
features are pooled across scale into a single, scale-invariant
encoding. Both in bag-of-words and the recently popular representations
based on convolutional neural networks, local features are computed at
multiple scales. However, these multi-scale convolutional features are
pooled into a single scale-invariant representation. We argue that
entirely scale-invariant image representations are sub-optimal and
investigate approaches to scale coding within a bag of deep features
framework. Our approach encodes multi-scale information explicitly
during the image encoding stage. We propose two strategies to encode
multi-scale information explicitly in the final image representation. We
validate our two scale coding techniques on five datasets: Willow,
PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes
(HAT-27). On all datasets, the proposed scale coding approaches
outperform both the scale-invariant method and the standard deep
features of the same network. Further, combining our scale coding
approaches with standard deep features leads to consistent improvement
over the state of the art.
KW - Action recognition
KW - Attribute recognition
KW - Bag of deep features
UR - http://www.scopus.com/inward/record.url?scp=85029036140&partnerID=8YFLogxK
U2 - 10.1007/s00138-017-0871-1
DO - 10.1007/s00138-017-0871-1
M3 - Article
AN - SCOPUS:85029036140
VL - 29
SP - 55
EP - 71
JO - MACHINE VISION AND APPLICATIONS
JF - MACHINE VISION AND APPLICATIONS
SN - 0932-8092
IS - 1
ER -