Scheduled Sampling for Transformers

Tsvetomila Mihaylova, Andre F.T. Martins

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

Abstrakti

Scheduled sampling is a technique for avoiding one of the known problems in sequence-to-sequence generation: exposure bias. It consists of feeding the model a mix of the teacher forced embeddings and the model predictions from the previous step in training time. The technique has been used for improving model performance with recurrent neural networks (RNN). In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique. We propose some structural changes to allow scheduled sampling to be applied to Transformer architectures, via a two-pass decoding strategy. Experiments on two language pairs achieve performance close to a teacher-forcing baseline and show that this technique is promising for further exploration.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019
KustantajaAssociation for Computational Linguistics
Sivut351-356
ISBN (elektroninen)9781950737475
DOI - pysyväislinkit
TilaJulkaistu - 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaAnnual Meeting of the Association for Computational Linguistics: Student Research Workshop - Florence, Italia
Kesto: 28 heinäk. 20192 elok. 2019

Workshop

WorkshopAnnual Meeting of the Association for Computational Linguistics
LyhennettäSRW
Maa/AlueItalia
KaupunkiFlorence
Ajanjakso28/07/201902/08/2019

Sormenjälki

Sukella tutkimusaiheisiin 'Scheduled Sampling for Transformers'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä