Towards Efficient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training

Janne Pylkkönen

    Research output: ThesisDoctoral ThesisCollection of Articles

    Abstract

    Automatic speech recognition has been widely studied and is already being applied in everyday use. Nevertheless, the recognition performance is still a bottleneck in many practical applications of large vocabulary continuous speech recognition. Either the recognition speed is not sufficient, or the errors in the recognition result limit the applications. This thesis studies two aspects of speech recognition, decoding and training of acoustic models, to improve speech recognition performance in different conditions. A major part of this thesis studies discriminative training of acoustic models. The emphasis is on the most popular algorithm for discriminative model estimation, the extended Baum-Welch algorithm. The thesis points out theoretical connections of the algorithm to general constrained optimization. It also proposes new control methods for the algorithm, which are shown to improve the robustness of the acoustic models in several large vocabulary speech recognition tasks. Discriminative training methods are widely applied in the state-of-the-art speech recognizers which utilize the prevalent hidden Markov models for acoustic modeling. Therefore the proposed methods have many immediate practical applications. The speech recognition system developed at the Aalto university was utilized and significantly improved during the research of this thesis. The thesis gives an overview of that system and describes the decoder of the system in more detail. In speech recognition systems, the decoder combines the information from the statistical models of acoustics and language to implement the search for the word sequence which best matches the input speech. The thesis proposes new methods for improving the speed of this search, without incurring losses to the recognition accuracy.
    Translated title of the contributionKohti tehokasta ja häiriöitä sietävää automaattista puheentunnistusta: tekniikoita dekoodaukseen ja diskriminatiiviseen opetukseen
    Original languageEnglish
    QualificationDoctor's degree
    Awarding Institution
    • Aalto University
    Supervisors/Advisors
    • Oja, Erkki, Supervising Professor
    • Kurimo, Mikko, Thesis Advisor
    Publisher
    Print ISBNs978-952-60-5063-8
    Electronic ISBNs978-952-60-5064-5
    Publication statusPublished - 2013
    MoE publication typeG5 Doctoral dissertation (article)

    Keywords

    • automatic speech recognition
    • decoder
    • acoustic modeling
    • discriminative training
    • extended Baum-Welch

    Fingerprint Dive into the research topics of 'Towards Efficient and Robust Automatic Speech Recognition: Decoding Techniques and Discriminative Training'. Together they form a unique fingerprint.

    Cite this