First-Pass Techniques for Very Large Vocabulary Speech Recognition of Morphologically Rich Languages

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Research units

  • Utopia Analytics Oy


In speech recognition of morphologically rich languages, very large vocabulary sizes are required to achieve good error rates. Especially traditional n-gram language models trained over word sequences suffer from data sparsity issues. The language modelling can often be improved by segmenting the words to sequences of subword units that are more frequent. Another solution is to cluster the words into classes and apply a class-based language model. We show that linearly interpolating n-gram models trained over words, subwords, and word classes improves the first-pass speech recognition accuracy in
very large vocabulary speech recognition tasks for two morphologically rich and agglutinative languages, Finnish and Estonian. To overcome performance issues, we also introduce a novel language model look-ahead method utilizing a class bigram model. The method improves the results over a unigram look-ahead model with the same recognition speed, the difference increasing for small real-time factors. The improved model combination and look-ahead model are useful in cases where real-time recognition is required or when the improved hypotheses help with further recognition passes. For instance, neural network language models are mostly applied by rescoring the generated hypotheses due to higher computational costs.


Original languageEnglish
Title of host publication2018 IEEE Spoken Language Technology Workshop, December 18-21, 2018, Athens, Greece
Publication statusPublished - 2018
MoE publication typeA4 Article in a conference publication
EventIEEE Spoken Language Technology Workshop - Athens, Greece
Duration: 18 Dec 201821 Dec 2018


WorkshopIEEE Spoken Language Technology Workshop
Abbreviated titleSLT

    Research areas

  • speech recognition, morphologically rich languages, decoding, class n-gram models, subword n-gram models

ID: 30566774