Towards Cycle-Consistent Models for Text and Image Retrieval

Marcella Cornia, Lorenzo Baraldi, Hamed Rezazadegan Tavakoli, Rita Cucchiara

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)


Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn a joint multi-modal embedding space in which text and images could be projected and compared. Here we investigate a different approach, and reformulate the problem of cross-modal retrieval as that of learning a translation between the textual and visual domain. In particular, we propose an end-to-end trainable model which can translate text into image features and vice versa, and regularizes this mapping with a cycle-consistency criterion. Preliminary experimental evaluations show promising results with respect to ordinary visual-semantic models.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 Workshops, Proceedings
EditorsLaura Leal-Taixé, Stefan Roth
Number of pages5
ISBN (Electronic)978-3-030-11018-5
Publication statusPublished - 1 Jan 2019
MoE publication typeA4 Article in a conference publication
EventEuropean Conference on Computer Vision - Munich, Germany
Duration: 8 Sep 201814 Sep 2018
Conference number: 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11132 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision
Abbreviated titleECCV


  • Cross-modal retrieval
  • Cycle consistency
  • Visual-semantic models


Dive into the research topics of 'Towards Cycle-Consistent Models for Text and Image Retrieval'. Together they form a unique fingerprint.

Cite this