Character-based units for Unlimited Vocabulary Continuous Speech Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Research units


We study character-based language models in the state-of-the-art speech recognition framework. This approach has advantages over both word-based systems and so-called end-to-end ASR systems that do not have separate acoustic and language models. We describe the necessary modifications needed to build an effective character-based ASR system using the Kaldi toolkit and evaluate the models based on words, statistical morphs, and characters for both Finnish and Arabic. The morph-based models yield the best recognition results for both well-resourced and lower-resourced tasks, but the character-based models are close to their performance in the lower-resource tasks, outperforming the word-based models. Character-based models are especially good at predicting novel word forms that were not seen in the training data. Using character-based neural network language models is both computationally efficient and provides a larger gain compared to the morph and word-based systems.


Original languageEnglish
Title of host publicationAutomatic Speech Recognition and Understanding (ASRU), IEEE Workshop on
Publication statusPublished - 2018
MoE publication typeA4 Article in a conference publication
EventIEEE Automatic Speech Recognition and Understanding Workshop - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017


WorkshopIEEE Automatic Speech Recognition and Understanding Workshop
Abbreviated titleASRU
Internet address

Download statistics

No data available

ID: 14903665