Exploiting Scene Context for Image Captioning

Rakshith Shetty, Hamed Rezazadegan Tavakoli, Jorma Laaksonen

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

6 Sitaatiot (Scopus)

Abstrakti

This paper presents a framework for image captioning by exploiting the scene context. To date, most of the captioning models have been relying on the combination of Convolutional Neural Networks (CNN) and the Long-Short Term Memory (LSTM) model, trained in an end-to-end fashion. Recently, there has been extensive research towards improving the language model and the CNN architecture, utilizing attention mechanisms, and improving the learning techniques in such systems. A less studied area is the contribution of the scene context in the captioning. In this work, we study the role of the scene context, consisting of the scene type and objects. To this end, we augment the CNN features with scene context features, including scene detectors, objects and their localization, and their combinations. We use the scene context features as an initialization feature at the zeroth time step in a LSTM model with deep residual connections. In subsequent time steps, the model, however, uses the original CNN features. The proposed language model, contrary to more conventional ones, thus has access to visual features through the whole process of sentence generation. We demonstrate that the scene context features affect the language formation and improve the captioning results in the proposed framework. We also report results from the Microsoft COCO benchmark, where our model achieves the state-of-the-art performance on the test set.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion
KustantajaACM
Sivut1-8
ISBN (elektroninen)978-1-4503-4519-4
DOI - pysyväislinkit
TilaJulkaistu - 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaACM Multimedia - Amsterdam, Alankomaat
Kesto: 15 lokak. 201619 lokak. 2016
Konferenssinumero: 24

Conference

ConferenceACM Multimedia
LyhennettäACMMM
Maa/AlueAlankomaat
KaupunkiAmsterdam
Ajanjakso15/10/201619/10/2016

Sormenjälki

Sukella tutkimusaiheisiin 'Exploiting Scene Context for Image Captioning'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.
  • Suomalainen laskennallisen päättelyn huippuyksikkö

    Xu, Y., Rintanen, J., Kaski, S., Anwer, R., Parviainen, P., Soare, M., Vuollekoski, H., Rezazadegan Tavakoli, H., Peltola, T., Blomstedt, P., Puranen, S., Dutta, R., Gebser, M., Mononen, T., Bogaerts, B., Tasharrofi, S., Pesonen, H., Weinzierl, A. & Yang, Z.

    01/01/201531/12/2017

    Projekti: Academy of Finland: Other research funding

Siteeraa tätä