PicSOM Experiments in TRECVID 2018

Mats Sjöberg, Hamed R. Tavakoli, Zhicun Xu, Héctor Laria Mantecón, Jorma Laaksonen

Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

9 Downloads (Pure)

Abstract

This year, the PicSOM group participated only in the Video to Text (VTT), Description Generation subtask. For our submitted runs we used either the MSR-VTT dataset only, or MS COCO and MSR-VTT jointly for training. We used LSTM recurrent neural networks to generate descriptions based on multi-modal features extracted from the videos. We submitted four runs: • PICSOM_1: uses ResNet features for initialising the LSTM generator, and object and scene-type detection features as persistent input to the generator which is trained on MS COCO + MSR-VTT, • PICSOM_2: uses ResNet and object detection features for initialisation, and is trained on MS COCO + MSR-VTT, this is the only run based on our new PyTorch codebase, • PICSOM_3: uses ResNet and video category features for initialisation, and trajectory and audio-visual embedding features for persistent features, trained on MSR-VTT only, • PICSOM_4: is the same as PICSOM_3 except that the audio-visual embedding feature has been replaced with audio class detection outputs. The most significant difference between our runs came from expanding the original MSR-VTT training dataset by including MS COCO, which contains images annotated with captions. Having a larger and more diverse training set seems to bring larger improvements to the performance measures than using more advanced features. This finding has been confirmed also by our post-submission experiments that we are still continuing.
Original languageEnglish
Title of host publicationProceedings of the TRECVID 2018 Workshop
Place of PublicationGaithersburg, MD, USA
Publication statusPublished - 1 Nov 2018
MoE publication typeD3 Professional conference proceedings
EventInternational Workshop on Video Retrieval Evaluation - Gaithersburg, United States
Duration: 13 Nov 201815 Nov 2018

Workshop

WorkshopInternational Workshop on Video Retrieval Evaluation
Abbreviated titleTRECVID
CountryUnited States
CityGaithersburg
Period13/11/201815/11/2018

Fingerprint Dive into the research topics of 'PicSOM Experiments in TRECVID 2018'. Together they form a unique fingerprint.

  • Projects

    Equipment

    Science-IT

    Mikko Hakala (Manager)

    School of Science

    Facility/equipment: Facility

  • Cite this

    Sjöberg, M., Tavakoli, H. R., Xu, Z., Mantecón, H. L., & Laaksonen, J. (2018). PicSOM Experiments in TRECVID 2018. In Proceedings of the TRECVID 2018 Workshop Gaithersburg, MD, USA.