Intelligibility enhancement of telephone speech using Gaussian process regression for normal-to-Lombard spectral tilt conversion

Tutkimustuotos: Lehtiartikkeli



  • University of Helsinki


Noise in the environment can decrease the quality and intelligibility of a telephone conversation. This study focuses on the intelligibility enhancement of narrowband telephone speech in a near-end noise scenario using a post-processing method based on normal-to-Lombard spectral tilt conversion. The proposed technique uses nonparallel, conversational normal and Lombard speech together with Gaussian process regression in order to mimic the flattening of the spectral tilt that occurs in the production of natural speech in noisy conditions. The performance of the proposed method was evaluated in comparison to two reference methods, a fixed high-pass filter and a baseline spectral tilt conversion, as well as in comparison to unprocessed speech in terms of intelligibility and listening preference in noisy conditions and in terms of pressedness in silent conditions. The results indicate that while the proposed technique provides a similar benefit in terms of intelligibility as fixed high-pass filtering, it is also able to produce a notable increase in pressedness. This suggests that the developed processing of the spectral tilt can compete with fixed high-pass filtering in intelligibility enhancement, but it is also able to convert speech to become perceptually closer to natural Lombard speech.


JulkaisuIEEE/ACM Transactions on Audio, Speech, and Language Processing
TilaJulkaistu - 2017
OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

ID: 15225740