PicSOM Experiments in TRECVID 2018

Mats Sjöberg, Hamed R. Tavakoli, Zhicun Xu, Héctor Laria Mantecón, Jorma Laaksonen

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsProfessional

3 Citations (Scopus)
36 Downloads (Pure)


This year, the PicSOM group participated only in the Video to Text (VTT), Description Generation subtask. For our submitted runs we used either the MSR-VTT dataset only, or MS COCO and MSR-VTT jointly for training. We used LSTM recurrent neural networks to generate descriptions based on multi-modal features extracted from the videos. We submitted four runs: • PICSOM_1: uses ResNet features for initialising the LSTM generator, and object and scene-type detection features as persistent input to the generator which is trained on MS COCO + MSR-VTT, • PICSOM_2: uses ResNet and object detection features for initialisation, and is trained on MS COCO + MSR-VTT, this is the only run based on our new PyTorch codebase, • PICSOM_3: uses ResNet and video category features for initialisation, and trajectory and audio-visual embedding features for persistent features, trained on MSR-VTT only, • PICSOM_4: is the same as PICSOM_3 except that the audio-visual embedding feature has been replaced with audio class detection outputs. The most significant difference between our runs came from expanding the original MSR-VTT training dataset by including MS COCO, which contains images annotated with captions. Having a larger and more diverse training set seems to bring larger improvements to the performance measures than using more advanced features. This finding has been confirmed also by our post-submission experiments that we are still continuing.
Original languageEnglish
Title of host publicationProceedings of the TRECVID 2018 Workshop
Place of PublicationGaithersburg, MD, USA
Publication statusPublished - 1 Nov 2018
MoE publication typeD3 Professional conference proceedings
EventInternational Workshop on Video Retrieval Evaluation - Gaithersburg, United States
Duration: 13 Nov 201815 Nov 2018


WorkshopInternational Workshop on Video Retrieval Evaluation
Abbreviated titleTRECVID
Country/TerritoryUnited States


Dive into the research topics of 'PicSOM Experiments in TRECVID 2018'. Together they form a unique fingerprint.

Cite this