In this chapter, we describe the structure and operation of the visual concept-detection subsystem of the PicSOM multimedia retrieval system. We evaluate several alternative techniques used for implementing this component and show the essential results of a series of experiments in the large-scale setups of the TRECVID video retrieval evaluation campaigns in 2005, 2009, and 2014. During these years, the PicSOM system has gone through substantial evolution in both the statistical features and the detection algorithms employed. Transition from global image features to the bag-of-visual-words features and recently further to convolutional deep neural network-based features is also justified in the light of our results. Overall, during the 10 years of participation in TRECVID, the PicSOM system has shown close to the state-of-the-art performance in this very rapidly developing field of research.
|Title of host publication||Advances in Independent Component Analysis and Learning Machines|
|Editors||Ella Bingham, Samuel Kaski, Jorma Laaksonen, Jouko Lampinen|
|Place of Publication||Amsterdam, The Netherlands|
|Publication status||Published - 2015|
|MoE publication type||A3 Book chapter|