Abstrakti
Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT – a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020 |
Kustantaja | Association for Computational Linguistics |
Sivut | 2186-2202 |
ISBN (elektroninen) | 978-1-952148-90-3 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2020 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | Conference on Empirical Methods in Natural Language Processing - Virtual, Online Kesto: 16 marrask. 2020 → 20 marrask. 2020 |
Conference
Conference | Conference on Empirical Methods in Natural Language Processing |
---|---|
Lyhennettä | EMNLP |
Kaupunki | Virtual, Online |
Ajanjakso | 16/11/2020 → 20/11/2020 |