A strong and robust baseline for text-image matching

Fangyu Liu, Rongtian Ye

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

9 Sitaatiot (Scopus)

Abstrakti

We review the current schemes of text-image matching models and propose improvements for both training and inference. First, we empirically show limitations of two popular loss (sum and max-margin loss) widely used in training text-image embeddings and propose a trade-off: a kNN-margin loss which 1) utilizes information from hard negatives and 2) is robust to noise as all K-most hardest samples are taken into account, tolerating pseudo negatives and outliers. Second, we advocate the use of Inverted Softmax (IS) and Crossmodal Local Scaling (CSLS) during inference to mitigate the so-called hubness problem in high-dimensional embedding space, enhancing scores of all metrics by a large margin.

AlkuperäiskieliEnglanti
OtsikkoACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop
KustantajaAssociation for Computational Linguistics
Sivut169-176
Sivumäärä8
ISBN (elektroninen)9781950737475
TilaJulkaistu - 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaAnnual Meeting of the Association for Computational Linguistics: Student Research Workshop - Florence, Italia
Kesto: 28 heinäk. 20192 elok. 2019

Workshop

WorkshopAnnual Meeting of the Association for Computational Linguistics
LyhennettäSRW
Maa/AlueItalia
KaupunkiFlorence
Ajanjakso28/07/201902/08/2019

Sormenjälki

Sukella tutkimusaiheisiin 'A strong and robust baseline for text-image matching'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä