Projects per year
Abstract
Image Difference Captioning (IDC) aims at generating sentences to describe differences between two similar-looking images. Conventional approaches learn an IDC model with a pre-trained and usually frozen visual feature extractor. Accordingly, two major issues may arise: (1) a large domain gap usually exists between the pre-training datasets used for training such a visual encoder and that of the downstream IDC task, and (2) the visual feature extractor, when separately encoding two images, often does not effectively encode the visual changes between two images. Due to the excellent zero-shot performance of the recently proposed CLIP, we thus propose CLIP4IDC to transfer a CLIP model for the IDC task to address those issues. Different from directly fine-tuning CLIP to generate sentences, we introduce an adaptation training process to adapt CLIP’s visual encoder to capture and align differences in image pairs based on the textual descriptions. Experiments on three IDC benchmark datasets, CLEVR-Change, Spot-the-Diff, and Image-Editing-Request, demonstrate the effectiveness of CLIP4IDC.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP) |
Publisher | Association for Computational Linguistics |
Pages | 33-42 |
Volume | 2 |
ISBN (Electronic) | 978-1-955917-64-3 |
Publication status | Published - Nov 2022 |
MoE publication type | A4 Article in a conference publication |
Event | 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing - Virtual, Online Duration: 20 Nov 2022 → 23 Nov 2022 |
Conference
Conference | 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing |
---|---|
Abbreviated title | AACL-IJCNLP |
City | Virtual, Online |
Period | 20/11/2022 → 23/11/2022 |
Fingerprint
Dive into the research topics of 'CLIP4IDC: CLIP for Image Difference Captioning'. Together they form a unique fingerprint.-
USSEE: Understanding speech and scene with ears and eyes (USSEE)
Laaksonen, J., Pehlivan Tort, S., Wang, T., Guo, Z., Tiwari, H. & Arora, P.
01/01/2022 → 31/12/2024
Project: Academy of Finland: Other research funding
-
-: Movie Making Finland: Finnish fiction films as audiovisual big data, 1907-2017
Laaksonen, J., Wang, T. & Pehlivan Tort, S.
01/01/2020 → 31/12/2022
Project: Academy of Finland: Other research funding
-
-: Artificial Intelligence for Retrieval of Forest Biomass & Structure
Laaksonen, J., Anwer, R., Wang, T. & Guo, Z.
01/01/2018 → 31/12/2022
Project: Academy of Finland: Other research funding