Glottal source estimation from coded telephone speech using a deep neural network

Narendra Nonavinakere Prabhakera, Manu Airaksinen, Paavo Alku

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    4 Citations (Scopus)
    423 Downloads (Pure)

    Abstract

    In speech analysis, the information about the glottal source is obtained from speech by using glottal inverse filtering (GIF). The accuracy of state-of-the-art GIF methods is sufficiently high when the input speech signal is of high-quality (i.e., with little noise or reverberation). However, in realistic conditions, particularly when GIF is computed from coded telephone speech, the accuracy of GIF methods deteriorates severely. To robustly estimate the glottal source under coded condition, a deep neural network (DNN)-based method is proposed. The proposed method utilizes a DNN to map the speech features extracted from the coded speech to the glottal flow waveform estimated from the corresponding clean speech. To generate the coded telephone speech, adaptive multi-rate (AMR) codec is utilized which is a widely used speech compression method. The proposed glottal source estimation method is compared with two existing GIF methods, closed phase covariance analysis (CP) and iterative adaptive inverse filtering (IAIF). The results indicate that the proposed DNN-based method is capable of estimating glottal flow waveforms from coded telephone speech with a considerably better accuracy in comparison to CP and IAIF.
    Original languageEnglish
    Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    PublisherInternational Speech Communication Association (ISCA)
    Pages3931-3935
    Number of pages5
    Volume2017-August
    ISBN (Print)978-1-5108-4876-4
    DOIs
    Publication statusPublished - Aug 2017
    MoE publication typeA4 Conference publication
    EventInterspeech - Stockholm, Sweden
    Duration: 20 Aug 201724 Aug 2017
    Conference number: 18
    http://www.interspeech2017.org/

    Publication series

    NameInterspeech: Annual Conference of the International Speech Communication Association
    ISSN (Electronic)1990-9772

    Conference

    ConferenceInterspeech
    Country/TerritorySweden
    CityStockholm
    Period20/08/201724/08/2017
    Internet address

    Keywords

    • glottal source estimation
    • glottal inverse filtering
    • deep neural network
    • telephone speech

    Fingerprint

    Dive into the research topics of 'Glottal source estimation from coded telephone speech using a deep neural network'. Together they form a unique fingerprint.

    Cite this