Projekteja vuodessa
Abstrakti
Dense video captioning (VC) aims at generating a paragraph-long description for events in video segments. Borrowing from the success in language modeling, Transformer-based models for VC have been shown effective also in modeling cross-domain video-text representations with cross-attention (Xatt). Despite Xatt’s effectiveness, the queries and outputs of attention, which are from different domains, tend to be weakly related. In this paper, we argue that the weak relatedness, or domain discrepancy, could impede a model from learning meaningful cross-domain representations. Hence, we propose a simple yet effective Post-Attention Modulator (PAM) that post-processes Xatt’s outputs to narrow the discrepancy. Specifically, PAM modulates and enhances the average similarity over Xatt’s queries and outputs. The modulated similarities are then utilized as a weighting basis to interpolate PAM’s outputs. In our experiments, PAM was applied to two strong VC baselines, VTransformer and MART, with two different video features on the well-known VC benchmark datasets ActivityNet Captions and YouCookII. According to the results, the proposed PAM brings consistent improvements in, e.g., CIDEr-D at most to 14.5%, as well as other metrics, BLEU and METEOR, considered.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Proceedings of the 26th International Conference on Pattern Recognition (ICPR) |
Kustantaja | IEEE |
Sivut | 1536-1542 |
ISBN (elektroninen) | 978-1-6654-9062-7 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2022 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Conference on Pattern Recognition - Montreal, Kanada Kesto: 21 elok. 2022 → 25 elok. 2022 Konferenssinumero: 26 |
Julkaisusarja
Nimi | International Conference on Pattern Recognition |
---|---|
ISSN (painettu) | 1051-4651 |
Conference
Conference | International Conference on Pattern Recognition |
---|---|
Lyhennettä | ICPR |
Maa/Alue | Kanada |
Kaupunki | Montreal |
Ajanjakso | 21/08/2022 → 25/08/2022 |
Sormenjälki
Sukella tutkimusaiheisiin 'Post-Attention Modulator for Dense Video Captioning'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.-
Tarkeempi puheen ja videokuvan tunnistus silmät ja korvat auki
Laaksonen, J., Wang, T., Guo, Z. & Pehlivan Tort, S.
01/01/2022 → 31/12/2024
Projekti: Academy of Finland: Other research funding
-
Elokuvien muovaama Suomi: suomalainen näytelmäelokuva audiovisuaalisena big datana 1907-2017
Laaksonen, J., Pehlivan Tort, S. & Wang, T.
01/01/2020 → 31/12/2022
Projekti: Academy of Finland: Other research funding
-
Tekoäly metsän biomassan ja rakenteen arvioinnissa
Laaksonen, J., Anwer, R., Guo, Z. & Wang, T.
01/01/2018 → 31/12/2022
Projekti: Academy of Finland: Other research funding