A linear predictive spectral estimation method based on higher-lag autocorrelation coefficients is proposed for the noise-robust feature extraction from speech. The method, called higher-lag linear prediction, is derived from a signal prediction model that is optimized in the mean square sense using a cost function that has two prediction error terms, the first of which is similar to that of conventional linear prediction and the second of which is a delayed version introducing an integer delay of M samples. This basic form is developed further into the combined higher-lag linear prediction (CHLLP) model by simultaneously taking advantage of the zero-lag and higher-lag predictions. The CHLLP model was used in the computation of mel-frequency cepstral coefficients and compared with several reference feature extraction methods in speaker recognition. The experiments were conducted by using a modern i-vector-based system. Noise-corruption was done using both additive car, babble, and factory noise in different signal-to-noise ratio conditions as well as speech recordings from real noisy conditions. The results indicate that CHLLP outperformed the reference feature extraction methods in almost all the comparisons in the noise-corrupted conditions and the performance of CHLLP was only slightly inferior to the nonparametric FFT-based spectral modeling in the clean condition.
- linear prediction, mismatch, robust feature extraction, speaker recognition