PicSOM Experiments in TRECVID 2018

Mats Sjöberg, Hamed R. Tavakoli, Zhicun Xu, Héctor Laria Mantecón, Jorma Laaksonen

Research output: Chapter in Book/Report/Conference proceedingConference contributionProfessional

1 Citation (Scopus)
11 Downloads (Pure)

Abstract

This year, the PicSOM group participated only in the Video to Text (VTT), Description Generation subtask. For our submitted runs we used either the MSR-VTT dataset only, or MS COCO and MSR-VTT jointly for training. We used LSTM recurrent neural networks to generate descriptions based on multi-modal features extracted from the videos. We submitted four runs: • PICSOM_1: uses ResNet features for initialising the LSTM generator, and object and scene-type detection features as persistent input to the generator which is trained on MS COCO + MSR-VTT, • PICSOM_2: uses ResNet and object detection features for initialisation, and is trained on MS COCO + MSR-VTT, this is the only run based on our new PyTorch codebase, • PICSOM_3: uses ResNet and video category features for initialisation, and trajectory and audio-visual embedding features for persistent features, trained on MSR-VTT only, • PICSOM_4: is the same as PICSOM_3 except that the audio-visual embedding feature has been replaced with audio class detection outputs. The most significant difference between our runs came from expanding the original MSR-VTT training dataset by including MS COCO, which contains images annotated with captions. Having a larger and more diverse training set seems to bring larger improvements to the performance measures than using more advanced features. This finding has been confirmed also by our post-submission experiments that we are still continuing.
Original languageEnglish
Title of host publicationProceedings of the TRECVID 2018 Workshop
Place of PublicationGaithersburg, MD, USA
Publication statusPublished - 1 Nov 2018
MoE publication typeD3 Professional conference proceedings
EventInternational Workshop on Video Retrieval Evaluation - Gaithersburg, United States
Duration: 13 Nov 201815 Nov 2018

Workshop

WorkshopInternational Workshop on Video Retrieval Evaluation
Abbreviated titleTRECVID
CountryUnited States
CityGaithersburg
Period13/11/201815/11/2018

Fingerprint Dive into the research topics of 'PicSOM Experiments in TRECVID 2018'. Together they form a unique fingerprint.

Cite this