Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

9 Citations (Scopus)
270 Downloads (Pure)

Abstract

Feature extraction of speech signals is typically performed in short-time frames by assuming that the signal is stationary within each frame. For the extraction of the spectral envelope of speech, which conveys the formant frequencies produced by the resonances of the slowly varying vocal tract, an often used frame length is within 20-30 ms. However, this kind of conventional frame-based spectral analysis is oblivious of the broader temporal context of the signal and is prone to degradation by, for example, environmental noise. In this paper, we propose a new frame-based linear prediction (LP) analysis method that includes a regularization term that penalizes energy differences in consecutive frames of an all-pole spectral envelope model. This integrates the slowly varying nature of the vocal tract as a part of the analysis. Objective evaluations related to feature distortion and phonetic representational capability were performed by studying the properties of the mel-frequency cepstral coefficient (MFCC) representations computed from different spectral estimation methods under noisy conditions using the TIMIT database. The results show that the proposed time-regularized LP approach exhibits superior MFCC distortion behavior while simultaneously having the greatest average separability of different phoneme categories in comparison to the other methods.
Original languageEnglish
Title of host publicationProceedings of Interspeech
PublisherInternational Speech Communication Association (ISCA)
Pages701-705
Number of pages5
DOIs
Publication statusPublished - 2 Sept 2018
MoE publication typeA4 Conference publication
EventInterspeech - Hyderabad International Convention Centre, Hyderabad, India
Duration: 2 Sept 20186 Sept 2018
http://interspeech2018.org/

Publication series

NameInterspeech - Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech
Country/TerritoryIndia
CityHyderabad
Period02/09/201806/09/2018
Internet address

Fingerprint

Dive into the research topics of 'Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech'. Together they form a unique fingerprint.
  • Interdisciplinary research on statistical parametric speech synthesis

    Alku, P. (Principal investigator), Bäckström, T. (Project Member), Nonavinakere Prabhakera, N. (Project Member), Bollepalli, B. (Project Member), Murtola, T. (Project Member), Airaksinen, M. (Project Member) & Juvela, L. (Project Member)

    01/01/201831/12/2019

    Project: Academy of Finland: Other research funding

  • Personalized Speech Synthesis: Assistive Technology for People with Communication Disabilities

    Alku, P. (Principal investigator), Raitio, T. (Project Member), Pohjalainen, J. (Project Member), Juvela, L. (Project Member), Pulakka, H. (Project Member), Airaksinen, M. (Project Member), Bollepalli, B. (Project Member), Saeidi, R. (Project Member), Gowda, D. (Project Member) & Jokinen, E. (Project Member)

    01/09/201231/08/2016

    Project: Academy of Finland: Other research funding

Cite this