Abstract
Noise in the environment can decrease the quality and intelligibility of a telephone conversation. This study focuses on the intelligibility enhancement of narrowband telephone speech in a near-end noise scenario using a post-processing method based on normal-to-Lombard spectral tilt conversion. The proposed technique uses nonparallel, conversational normal and Lombard speech together with Gaussian process regression in order to mimic the flattening of the spectral tilt that occurs in the production of natural speech in noisy conditions. The performance of the proposed method was evaluated in comparison to two reference methods, a fixed high-pass filter and a baseline spectral tilt conversion, as well as in comparison to unprocessed speech in terms of intelligibility and listening preference in noisy conditions and in terms of pressedness in silent conditions. The results indicate that while the proposed technique provides a similar benefit in terms of intelligibility as fixed high-pass filtering, it is also able to produce a notable increase in pressedness. This suggests that the developed processing of the spectral tilt can compete with fixed high-pass filtering in intelligibility enhancement, but it is also able to convert speech to become perceptually closer to natural Lombard speech.
Original language | English |
---|---|
Pages (from-to) | 1985-1996 |
Number of pages | 12 |
Journal | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
Volume | 25 |
Issue number | 10 |
DOIs | |
Publication status | Published - 2017 |
MoE publication type | A1 Journal article-refereed |
Keywords
- Databases
- Intelligibility
- Lombard effect
- Noise measurement
- Speech
- Speech enhancement
- Telephone sets
- telephone speech
- Training