Abstract
Speaker adaptation is an important step in optimization and personalization of the performance of automatic speech recognition (ASR) for individual users. While many applications target in rapid adaptation by various global transformations, slower adaptation to obtain a higher level of personalization would be useful for many active ASR users, especially for those whose speech is not recognized well. This paper studies the outcome of combinations of maximum a posterior (MAP) adaptation and compression of Gaussian mixture models. An important result that has not received much previous attention is how MAP adaptation can be utilized to radically decrease the size of the models as they get tuned to a particular speaker. This is particularly relevant for small personal devices which should provide accurate recognition in real-time despite a low memory, computation, and electricity consumption. With our method we are able to decrease the model complexity with MAP adaptation while increasing the accuracy.
Original language | English |
---|---|
Title of host publication | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden |
Editors | Jörg Tiedemann |
Publisher | Linköping University Electronic Press |
Pages | 65-69 |
Number of pages | 5 |
ISBN (Print) | 978-91-7685-601-7 |
Publication status | Published - 2017 |
MoE publication type | A4 Conference publication |
Event | Nordic Conference on Computational Linguistics - Gothenburg, Sweden Duration: 22 May 2017 → 24 May 2017 Conference number: 21 |
Publication series
Name | Linköping Electronic Conference Proceedings |
---|---|
Publisher | Linköping University Electronic Press |
Volume | 131 |
ISSN (Print) | 1659-3686 |
ISSN (Electronic) | 1650-3740 |
Conference
Conference | Nordic Conference on Computational Linguistics |
---|---|
Abbreviated title | NoDaLiDa |
Country/Territory | Sweden |
City | Gothenburg |
Period | 22/05/2017 → 24/05/2017 |
Keywords
- MAP adaptation
- acoustic model adaptation
- Speech recognition
- Compression
- acoustic model compression
- Speaker adaptation