Abstract
Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT – a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020 |
Publisher | Association for Computational Linguistics |
Pages | 2186-2202 |
ISBN (Electronic) | 978-1-952148-90-3 |
DOIs | |
Publication status | Published - 2020 |
MoE publication type | A4 Conference publication |
Event | Conference on Empirical Methods in Natural Language Processing - Virtual, Online Duration: 16 Nov 2020 → 20 Nov 2020 |
Conference
Conference | Conference on Empirical Methods in Natural Language Processing |
---|---|
Abbreviated title | EMNLP |
City | Virtual, Online |
Period | 16/11/2020 → 20/11/2020 |