In the automatic speech recognition of agglutinative and morphologically rich languages, the recognition vocabulary may, in many tasks, need to cover several millions of word forms. This poses challenges for the search component of the speech recognizer, as in many cases, real-time recognition speed would be preferred, and the number of possible recognition hypotheses is large. A typical modern large vocabulary speech recognizer utilizes a probabilistic language model to assign prior probabilities for the word sequences. Estimating accurate language models from a text corpus also becomes harder due to increased data sparsity. So far, the most successful approach for the speech recognition of morphologically rich languages has been to segment the words to shorter, more frequently occurring units, thus alleviating the estimability problems. Also, if all concatenations of subwords are allowed, the recognition vocabulary is unlimited. This thesis concentrates on different approaches where a limited but very large recognition vocabulary is used. This type of recognizer can, in addition to the subword-based language models, also use language models trained over words and word classes to reach improved modeling accuracy. For the case where only a subword language model is used, the thesis shows a novel way of constructing the recognition graph. In this case, the recognition vocabulary is easy to augment with new word forms by utilizing resources like dictionaries and morphological analyzers. The constrained recognition vocabulary approaches are shown to be viable choices in many speech recognition use cases. Additionally, in this case, it is shown that the search may also operate in real time and even faster than the case where the recognition vocabulary was unlimited. Also, the recognition of non-words is avoided, and the recognition accuracy may exceed the unlimited vocabulary approach if a low enough out-of-vocabulary rate is reached. In one part of the thesis, human word recognition performance is analyzed using statistical morphological models in a visual lexical decision task where the participants' eye movements were also recorded using eye tracking. Morfessor Baseline -method, which segments only the infrequent words, predicted the observations well in most of the experiments. This finding supports the corresponding model of word recognition in humans.
|Translated title of the contribution||Menetelmiä laajan sanaston kielimallinnukseen ja puheentunnistukseen morfologisesti rikkaille kielille|
|Publication status||Published - 2020|
|MoE publication type||G5 Doctoral dissertation (article)|
- automatic speech recognition
- morphologically rich languages
- language modeling