Spectral modification for recognition of children’s speech under mismatched conditions

Hemant Kathania, Sudarsana Kadiri, Paavo Alku, Mikko Kurimo

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    60 Downloads (Pure)

    Abstract

    n this paper, we propose spectral modification by sharpening formants and by reducing the spectral tilt to recognize children’s speech by automatic speech recognition (ASR) systems developed using adult speech. In this type of mismatched condition, the ASR performance is degraded due to the acoustic and linguistic mismatch in the attributes between children and adult speakers. The proposed method is used to improve the speech intelligibility to enhance the children’s speech recognition using an acoustic model trained on adult speech. In the experiments, WSJCAM0 and PFSTAR are used as databases for adults’ and children’s speech, respectively. The proposed technique gives a significant improvement in the context of the DNN-HMM-based ASR. Furthermore, we validate the robustness of the technique by showing that it performs well also in mismatched noise conditions.
    Original languageEnglish
    Title of host publicationProceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
    Place of PublicationSweden
    PublisherLinköping University Electronic Press
    Pages94–100
    Number of pages7
    ISBN (Electronic)978-91-7929-614-8
    Publication statusPublished - 31 May 2021
    MoE publication typeA4 Conference publication
    EventNordic Conference on Computational Linguistics - Reykjavik, Iceland
    Duration: 31 May 20212 Jun 2021

    Publication series

    NameLinköping electronic conference proceedings
    Number178
    ISSN (Print)1650-3686
    ISSN (Electronic)1650-3740

    Conference

    ConferenceNordic Conference on Computational Linguistics
    Abbreviated titleNoDaLiDa
    Country/TerritoryIceland
    CityReykjavik
    Period31/05/202102/06/2021

    Fingerprint

    Dive into the research topics of 'Spectral modification for recognition of children’s speech under mismatched conditions'. Together they form a unique fingerprint.

    Cite this