The linear predictive modeling of speech from higher-lag autocorrelation coefficients applied to noise-robust speaker recognition

Research output: Contribution to journalArticleScientificpeer-review

Researchers

Research units

Abstract

A linear predictive spectral estimation method based on higher-lag autocorrelation coefficients is proposed for the noise-robust feature extraction from speech. The method, called higher-lag linear prediction, is derived from a signal prediction model that is optimized in the mean square sense using a cost function that has two prediction error terms, the first of which is similar to that of conventional linear prediction and the second of which is a delayed version introducing an integer delay of M samples. This basic form is developed further into the combined higher-lag linear prediction (CHLLP) model by simultaneously taking advantage of the zero-lag and higher-lag predictions. The CHLLP model was used in the computation of mel-frequency cepstral coefficients and compared with several reference feature extraction methods in speaker recognition. The experiments were conducted by using a modern i-vector-based system. Noise-corruption was done using both additive car, babble, and factory noise in different signal-to-noise ratio conditions as well as speech recordings from real noisy conditions. The results indicate that CHLLP outperformed the reference feature extraction methods in almost all the comparisons in the noise-corrupted conditions and the performance of CHLLP was only slightly inferior to the nonparametric FFT-based spectral modeling in the clean condition.

Details

Original languageEnglish
Pages (from-to)1606-1617
JournalIEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume25
Issue number8
Publication statusPublished - 1 Aug 2017
MoE publication typeA1 Journal article-refereed

    Research areas

  • linear prediction, mismatch, robust feature extraction, speaker recognition

ID: 14238791