From pixels to semantics: visual concept detection and its applications

Mats Sjöberg

    Tutkimustuotos: Doctoral ThesisCollection of Articles


    The amount of digital visual information available in the world today is enormous, and the rate at which more is continuously generated is simply unbelievable. For example YouTube gets 100 hours of new video every minute, and Facebook more than 350 million new photos every day. At best, this represents the creativity and knowledge of millions or even billions of people, made available to the entire world thanks to the Internet. The problem is of course: how do we find the "needle" that is relevant to us in this enormous "haystack"? Web search engines such as Google and Bing are decent solutions to find textual content, but finding relevant visual content is as yet an unsolved problem. The core issue is the semantic gap between the raw visual data processed by computers, and the abstract concepts and ideas humans use to communicate. This thesis studies one approach to this problem, namely using mid-level concepts to bridge the semantic gap. These semantic concepts are e.g. objects, locations, persons or events which are relatively concrete and thus comparatively easy to associate with the raw visual data. These can then be used to formulate more abstract queries, or used to index and further organise an image or video database. An overview of semantic concept detection using machine learning techniques is presented here, together with some applications. A central issue is keeping the computational speed and efficiency at a practical level for huge amounts of visual data, while still producing accurate and relevant results. To this end, this thesis studies several fast approximative versions of the popular Support Vector Machine (SVM) algorithm, and proposes some improvements to the fast Self-Organising Map (SOM) algorithm to improve its accuracy. Several large-scale real-world experimental applications are presented including image retrieval using social network tags, video search, indoor location recognition, and semantic visualisation of large image and video databases. The empirical evidence presented in this thesis shows that while the semantic gap problem is still not solved, the semantic concept approach produces concrete improvements to real-world applications. The improvements proposed and evaluated contribute to making the machine learning algorithms faster and thus more practically useful for processing huge amounts of visual data.
    Myöntävä instituutio
    • Aalto-yliopisto
    • Oja, Erkki, Vastuuprofessori
    • Laaksonen, Jorma, Ohjaaja
    Painoksen ISBN978-952-60-5900-6
    Sähköinen ISBN978-952-60-5901-3
    TilaJulkaistu - 2014
    OKM-julkaisutyyppiG5 Tohtorinväitöskirja (artikkeli)


    Sukella tutkimusaiheisiin 'From pixels to semantics: visual concept detection and its applications'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

    Siteeraa tätä