Projects per year
Abstract
In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.We apply speaker-aware training to attention-based end-to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We experiment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors.
Original language | English |
---|---|
Title of host publication | 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings |
Publisher | IEEE |
Pages | 7064-7068 |
Number of pages | 5 |
ISBN (Electronic) | 9781509066315 |
DOIs | |
Publication status | Published - May 2020 |
MoE publication type | A4 Article in a conference publication |
Event | IEEE International Conference on Acoustics, Speech, and Signal Processing - Virtual conference, Barcelona, Spain Duration: 4 May 2020 → 8 May 2020 Conference number: 45 |
Publication series
Name | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP |
Country/Territory | Spain |
City | Barcelona |
Period | 04/05/2020 → 08/05/2020 |
Other | Virtual conference |
Keywords
- end-to-end speech recognition
- speaker embedding
- speaker-adaptation
- speaker-aware training
Fingerprint
Dive into the research topics of 'Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings'. Together they form a unique fingerprint.Projects
- 1 Finished
-
MeMAD
Kurimo, M., Grönroos, S., Brander, T., Porjazovski, D., Raitio, R., Rouhe, A., Grósz, T. & Virkkunen, A.
27/12/2017 → 31/03/2021
Project: EU: Framework programmes funding