Feature Extraction Using Power-Law Adjusted Linear Prediction With Application to Speaker Recognition Under Severe Vocal Effort Mismatch

Tutkimustuotos: Lehtiartikkeli

Tutkijat

Organisaatiot

  • International Audio Laboratories Erlangen

Kuvaus

Linear prediction is one of the most established techniques in signal estimation, and it is widely utilized in speech signal processing. It has been long understood that the nerve firing rate of human auditory system can be approximated by power law non-linearity, and this has been the motivation behind using perceptual linear prediction in extracting acoustic features in a variety of speech processing applications. In this paper, we revisit the application of power law non-linearity in speech spectrum estimation by compressing/expanding power spectrum in autocorrelation-based linear prediction. The development of so-called LP-alpha is motivated by a desire to obtain spectral features that present less mismatch than conventionally used spectrum estimation methods when speech of normal loudness is compared to speech under vocal effort. The effectiveness of the proposed approach is demonstrated in a speaker recognition task conducted under severe vocal effort mismatch comparing shouted versus normal speech mode.

Yksityiskohdat

AlkuperäiskieliEnglanti
Sivut42-53
Sivumäärä12
JulkaisuIEEE/ACM Transactions on Audio, Speech, and Language Processing
Vuosikerta24
Numero1
TilaJulkaistu - tammikuuta 2016
OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

ID: 1499624