Image and Video Captioning with Augmented Neural Architectures

Rakshith Shetty, Hamed Rezazadegan Tavakoli, Jorma Laaksonen

Research output: Contribution to journalArticleScientificpeer-review


Neural-network-based image and video captioning can be substantially improved by utilizing architectures that make use of special features from the scene context, objects, and locations. A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.
Original languageEnglish
Pages (from-to)34-46
Number of pages13
JournalIEEE Multimedia
Issue number2
Publication statusPublished - 2018
MoE publication typeA1 Journal article-refereed


  • computer vision
  • applications and expert knowledge-intensive systems
  • artificial intelligence
  • computing
  • deep learning
  • image captioning
  • recurrent networks


Dive into the research topics of 'Image and Video Captioning with Augmented Neural Architectures'. Together they form a unique fingerprint.

Cite this