Abstrakti
We describe the speech recognition systems we have created for MGB-3, the 3rd Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of primarily Modern Standard Arabic speech and only a small amount (5 hours) of Egyptian adaptation data. Our system, which was a combination of different acoustic models, language models and lexical units, achieved a Multi-Reference Word Error Rate of 29.25%, which was the lowest in the competition. Also on the old MGB-2 task, which was run again to indicate progress, we achieved the lowest error rate: 13.2%.
The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (-27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional -5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another -10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding.
The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (-27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional -5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another -10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Automatic Speech Recognition and Understanding (ASRU), IEEE Workshop on |
Kustantaja | IEEE |
Sivut | 338-345 |
ISBN (elektroninen) | 978-1-5090-4788-8 |
ISBN (painettu) | 978-1-5090-4789-5 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2018 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | IEEE Automatic Speech Recognition and Understanding Workshop - Okinawa, Japani Kesto: 16 jouluk. 2017 → 20 jouluk. 2017 https://asru2017.org/ |
Workshop
Workshop | IEEE Automatic Speech Recognition and Understanding Workshop |
---|---|
Lyhennettä | ASRU |
Maa/Alue | Japani |
Kaupunki | Okinawa |
Ajanjakso | 16/12/2017 → 20/12/2017 |
www-osoite |
Sormenjälki
Sukella tutkimusaiheisiin 'Aalto system for the 2017 Arabic multi-genre broadcast challenge'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Laitteet
Palkinnot
-
MGB3 2017: Multi Genre Broadcast challenge for recognizing Arabic dialect speech
Smit, P. (Recipient), Gangireddy, S. (Recipient), Enarvi, S. (Recipient), Virpioja, S. (Recipient) & Kurimo, M. (Recipient), 2017
Palkinto: Sijoittuminen kilpailussa tai osallistuminen kutsukilpailuun