PicSOM and EURECOM Experiments in TRECVID 2019

Hector Laria Mantecon, Jorma Laaksonen, Danny Francis, Benoit Huet

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsProfessional

Abstract

This year, the PicSOM and EURECOM teams participated only in the Video to Text Description (VTT), Description Generation subtask. Both groups submitted one or two runs labeled as a”MeMAD” submission, stemming from a joint EU H2020 research project with that name. In total, the PicSOM team submitted four runs and EURECOM one run. The goal of the PicSOM submissions was to study the effect of using either image or video features or both. The goal of the EURECOM submission was to experiment with the use of Curriculum Learning in video captioning. The submitted five runs are as follows: • PICSOM.1-MEMAD.PRIMARY: uses ResNet and I3D features for initialising the LSTM generator, and is trained on MS COCO + TGIF using self-critical loss, • PICSOM.2-MEMAD: uses I3D features as initialisation, and is trained on TGIF using self-critical loss, • PICSOM.3: uses ResNet features as initialisation, and is trained on MS COCO + TGIF using self-critical loss, • PICSOM.4: is the same as PICSOM.1-MEMAD.PRIMARY except that the loss function used is cross-entropy, • EURECOM.MEMAD.PRIMARY: uses I3D features to initialize a GRU generator, and is trained on TGIF + MSR-VTT + MSVD with cross-entropy and curriculum learning. The runs aim at comparing the use of cross-entropy and self-critical training loss functions and to showing whether one can successfully use both still image and video features even when the COCO dataset does not allow the extractions of I3D video features. Based on the results of the runs, it seems that using both video and still image features, one can obtain better captioning results than with either one of the single modalities alone. The Curriculum Learning process proposed does not seem to be beneficial.
Original languageEnglish
Title of host publicationProceedings of TRECVID 2019
PublisherNational Institute of Standards and Technology (NIST)
Number of pages6
Publication statusPublished - Nov 2019
MoE publication typeD3 Professional conference proceedings
EventInternational Workshop on Video Retrieval Evaluation - Gaithersburg, United States
Duration: 12 Nov 201913 Nov 2019
Conference number: 23

Workshop

WorkshopInternational Workshop on Video Retrieval Evaluation
Abbreviated titleTRECVID
Country/TerritoryUnited States
CityGaithersburg
Period12/11/201913/11/2019

Fingerprint

Dive into the research topics of 'PicSOM and EURECOM Experiments in TRECVID 2019'. Together they form a unique fingerprint.

Cite this