Modern automatic speech recognition (ASR) systems are speaker independent and designed to recognize continuous large vocabulary speech. The key components of an ASR system are the acoustic model, language model, lexicon and decoder. A constant challenge for an ASR system over time, is how to adapt to changing topics and the introduction of new names and words. Enabling continuous topic adaptation for ASR systems requires finding new relevant text sources for adapting the language model and identifying words which need new and modified pronunciation rules. In this thesis, unsupervised methods that enable continuous topic adaptation for a Finnish morph-based ASR system are studied. Based on first-pass ASR output, topic and time relevant text data is retrieved from a collection of pre-indexed Web texts. Adapting the background language model with the best matching texts improves recognition accuracy. The recognition accuracy of foreign names and acronyms, one of the focus areas in this thesis, is also improved. Further improvement is achieved by identifying foreign names and acronyms in the retrieved texts, and generating adapted pronunciation rules for them. In statistical morph-based ASR, words are sometimes oversegmented. To enable a more reliable and easier mapping of adapted pronunciation rules, oversegmented foreign names and acronyms are restored back into their base forms. Morpheme restoration also improves recognition accuracy slightly. User feedback is also explored in this thesis for enabling ongoing lexicon adaptation of ASR systems. Based on user corrections of ASR output, optimal pronunciation rules for mis-recognized words are recovered by using forced alignment and Viterbi decoding. A collection of recovered pronunciation rules can be used for the recognition of new speech data. Experiments showed some minor improvements in the recognition of foreign names using user feedback based lexicon adaptation.
|Tila||Julkaistu - 2017|
|OKM-julkaisutyyppi||G5 Tohtorinväitöskirja (artikkeli)|