Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

1 Sitaatiot (Scopus)
125 Lataukset (Pure)

Abstrakti

Feature extraction of speech signals is typically performed in short-time frames by assuming that the signal is stationary within each frame. For the extraction of the spectral envelope of speech, which conveys the formant frequencies produced by the resonances of the slowly varying vocal tract, an often used frame length is within 20-30 ms. However, this kind of conventional frame-based spectral analysis is oblivious of the broader temporal context of the signal and is prone to degradation by, for example, environmental noise. In this paper, we propose a new frame-based linear prediction (LP) analysis method that includes a regularization term that penalizes energy differences in consecutive frames of an all-pole spectral envelope model. This integrates the slowly varying nature of the vocal tract as a part of the analysis. Objective evaluations related to feature distortion and phonetic representational capability were performed by studying the properties of the mel-frequency cepstral coefficient (MFCC) representations computed from different spectral estimation methods under noisy conditions using the TIMIT database. The results show that the proposed time-regularized LP approach exhibits superior MFCC distortion behavior while simultaneously having the greatest average separability of different phoneme categories in comparison to the other methods.
AlkuperäiskieliEnglanti
OtsikkoProceedings of Interspeech
KustantajaInternational Speech Communication Association
Sivut701-705
Sivumäärä5
DOI - pysyväislinkit
TilaJulkaistu - 2 syyskuuta 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInterspeech - Hyderabad International Convention Centre, Hyderabad, Intia
Kesto: 2 syyskuuta 20186 syyskuuta 2018
http://interspeech2018.org/

Julkaisusarja

NimiInterspeech - Annual Conference of the International Speech Communication Association
KustantajaInternational Speech Communication Association
ISSN (elektroninen)2308-457X

Conference

ConferenceInterspeech
MaaIntia
KaupunkiHyderabad
Ajanjakso02/09/201806/09/2018
www-osoite

Sormenjälki Sukella tutkimusaiheisiin 'Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

  • Projektit

    • 2 Päättynyt

    Poikkitieteellinen parametrisen puhesynteesin tutkimusprojekti

    Murtola, T., Bollepalli, B., Nonavinakere Prabhakera, N., Juvela, L., Airaksinen, M., Bäckström, T. & Alku, P.

    01/01/201824/01/2020

    Projekti: Academy of Finland: Other research funding

    Personoidun puhesynteesin kehittäminen puhevammaisten apuvälineteknologiaan

    Juvela, L., Airaksinen, M., Bollepalli, B., Pohjalainen, J., Jokinen, E., Gowda, D. & Alku, P.

    01/09/201231/08/2016

    Projekti: Academy of Finland: Other research funding

    Siteeraa tätä

    Airaksinen, M., Juvela, L., Räsänen, O., & Alku, P. (2018). Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech. teoksessa Proceedings of Interspeech (Sivut 701-705). (Interspeech - Annual Conference of the International Speech Communication Association). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2018-1230