Projects per year
Abstract
This year, the PicSOM group participated only in the Video to Text (VTT), Description Generation subtask. For our submitted runs we used either the MSR-VTT dataset only, or MS COCO and MSR-VTT jointly for training. We used LSTM recurrent neural networks to generate descriptions based on multi-modal features extracted from the videos. We submitted four runs: • PICSOM_1: uses ResNet features for initialising the LSTM generator, and object and scene-type detection features as persistent input to the generator which is trained on MS COCO + MSR-VTT, • PICSOM_2: uses ResNet and object detection features for initialisation, and is trained on MS COCO + MSR-VTT, this is the only run based on our new PyTorch codebase, • PICSOM_3: uses ResNet and video category features for initialisation, and trajectory and audio-visual embedding features for persistent features, trained on MSR-VTT only, • PICSOM_4: is the same as PICSOM_3 except that the audio-visual embedding feature has been replaced with audio class detection outputs. The most significant difference between our runs came from expanding the original MSR-VTT training dataset by including MS COCO, which contains images annotated with captions. Having a larger and more diverse training set seems to bring larger improvements to the performance measures than using more advanced features. This finding has been confirmed also by our post-submission experiments that we are still continuing.
Original language | English |
---|---|
Title of host publication | Proceedings of the TRECVID 2018 Workshop |
Place of Publication | Gaithersburg, MD, USA |
Publication status | Published - 1 Nov 2018 |
MoE publication type | D3 Professional conference proceedings |
Event | International Workshop on Video Retrieval Evaluation - Gaithersburg, United States Duration: 13 Nov 2018 → 15 Nov 2018 |
Workshop
Workshop | International Workshop on Video Retrieval Evaluation |
---|---|
Abbreviated title | TRECVID |
Country/Territory | United States |
City | Gaithersburg |
Period | 13/11/2018 → 15/11/2018 |
Fingerprint
Dive into the research topics of 'PicSOM Experiments in TRECVID 2018'. Together they form a unique fingerprint.Projects
- 1 Finished
-
MeMAD: Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy
Kurimo, M., Grósz, T., Raitio, R., Rouhe, A., Brander, T., Grönroos, S., Porjazovski, D. & Virkkunen, A.
27/12/2017 → 31/03/2021
Project: EU: Framework programmes funding