PicSOM Experiments in TRECVID 2018

Mats Sjöberg, Hamed R. Tavakoli, Zhicun Xu, Héctor Laria Mantecón, Jorma Laaksonen

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionProfessional

9 Lataukset (Pure)

Abstrakti

This year, the PicSOM group participated only in the Video to Text (VTT), Description Generation subtask. For our submitted runs we used either the MSR-VTT dataset only, or MS COCO and MSR-VTT jointly for training. We used LSTM recurrent neural networks to generate descriptions based on multi-modal features extracted from the videos. We submitted four runs: • PICSOM_1: uses ResNet features for initialising the LSTM generator, and object and scene-type detection features as persistent input to the generator which is trained on MS COCO + MSR-VTT, • PICSOM_2: uses ResNet and object detection features for initialisation, and is trained on MS COCO + MSR-VTT, this is the only run based on our new PyTorch codebase, • PICSOM_3: uses ResNet and video category features for initialisation, and trajectory and audio-visual embedding features for persistent features, trained on MSR-VTT only, • PICSOM_4: is the same as PICSOM_3 except that the audio-visual embedding feature has been replaced with audio class detection outputs. The most significant difference between our runs came from expanding the original MSR-VTT training dataset by including MS COCO, which contains images annotated with captions. Having a larger and more diverse training set seems to bring larger improvements to the performance measures than using more advanced features. This finding has been confirmed also by our post-submission experiments that we are still continuing.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the TRECVID 2018 Workshop
JulkaisupaikkaGaithersburg, MD, USA
TilaJulkaistu - 1 marraskuuta 2018
OKM-julkaisutyyppiD3 Ammatillisen konferenssin julkaisusarja
TapahtumaInternational Workshop on Video Retrieval Evaluation - Gaithersburg, Yhdysvallat
Kesto: 13 marraskuuta 201815 marraskuuta 2018

Workshop

WorkshopInternational Workshop on Video Retrieval Evaluation
LyhennettäTRECVID
MaaYhdysvallat
KaupunkiGaithersburg
Ajanjakso13/11/201815/11/2018

Sormenjälki Sukella tutkimusaiheisiin 'PicSOM Experiments in TRECVID 2018'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

  • Projektit

    Laitteet

    Science-IT

    Mikko Hakala (Manager)

    Perustieteiden korkeakoulu

    Laitteistot/tilat: Facility

  • Siteeraa tätä

    Sjöberg, M., Tavakoli, H. R., Xu, Z., Mantecón, H. L., & Laaksonen, J. (2018). PicSOM Experiments in TRECVID 2018. teoksessa Proceedings of the TRECVID 2018 Workshop Gaithersburg, MD, USA.