Aalto system for the 2017 Arabic multi-genre broadcast challenge

Peter Smit, Siva Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

16 Citations (Scopus)
530 Downloads (Pure)


We describe the speech recognition systems we have created for MGB-3, the 3rd Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of primarily Modern Standard Arabic speech and only a small amount (5 hours) of Egyptian adaptation data. Our system, which was a combination of different acoustic models, language models and lexical units, achieved a Multi-Reference Word Error Rate of 29.25%, which was the lowest in the competition. Also on the old MGB-2 task, which was run again to indicate progress, we achieved the lowest error rate: 13.2%.

The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (-27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional -5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another -10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding.
Original languageEnglish
Title of host publicationAutomatic Speech Recognition and Understanding (ASRU), IEEE Workshop on
ISBN (Electronic)978-1-5090-4788-8
ISBN (Print)978-1-5090-4789-5
Publication statusPublished - 2018
MoE publication typeA4 Conference publication
EventIEEE Automatic Speech Recognition and Understanding Workshop - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017


WorkshopIEEE Automatic Speech Recognition and Understanding Workshop
Abbreviated titleASRU
Internet address


Dive into the research topics of 'Aalto system for the 2017 Arabic multi-genre broadcast challenge'. Together they form a unique fingerprint.

Cite this