Intelligibility enhancement of telephone speech using Gaussian process regression for normal-to-Lombard spectral tilt conversion

Emma Jokinen, Ulpu Remes, Paavo Alku

Research output: Contribution to journalArticleScientificpeer-review

13 Citations (Scopus)


Noise in the environment can decrease the quality and intelligibility of a telephone conversation. This study focuses on the intelligibility enhancement of narrowband telephone speech in a near-end noise scenario using a post-processing method based on normal-to-Lombard spectral tilt conversion. The proposed technique uses nonparallel, conversational normal and Lombard speech together with Gaussian process regression in order to mimic the flattening of the spectral tilt that occurs in the production of natural speech in noisy conditions. The performance of the proposed method was evaluated in comparison to two reference methods, a fixed high-pass filter and a baseline spectral tilt conversion, as well as in comparison to unprocessed speech in terms of intelligibility and listening preference in noisy conditions and in terms of pressedness in silent conditions. The results indicate that while the proposed technique provides a similar benefit in terms of intelligibility as fixed high-pass filtering, it is also able to produce a notable increase in pressedness. This suggests that the developed processing of the spectral tilt can compete with fixed high-pass filtering in intelligibility enhancement, but it is also able to convert speech to become perceptually closer to natural Lombard speech.

Original languageEnglish
Pages (from-to)1985-1996
Number of pages12
JournalIEEE/ACM Transactions on Audio, Speech, and Language Processing
Issue number10
Publication statusPublished - 2017
MoE publication typeA1 Journal article-refereed


  • Databases
  • Intelligibility
  • Lombard effect
  • Noise measurement
  • Speech
  • Speech enhancement
  • Telephone sets
  • telephone speech
  • Training


Dive into the research topics of 'Intelligibility enhancement of telephone speech using Gaussian process regression for normal-to-Lombard spectral tilt conversion'. Together they form a unique fingerprint.

Cite this