Techniques for speech intelligibility enhancement in mobile telephony

Emma Jokinen

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

Today's consumers can use their mobile telephony devices almost anywhere and at any time. This means that speech communication is often disturbed by environmental background noise, making it hard for the listener to understand what the speaker is saying. To further aggravate the situation, the listener and the speaker are typically in different locations when the communication is taking place. This means that without listener feedback the speaker is unable to adjust his or her speaking style to fit the listening environment, as is normally done in face-to-face communication situations. However, speech communication by mobile telephony in noisy conditions can be improved using intelligibility enhancement technology. This thesis contributes to the development of intelligibility enhancement techniques that can in principle be applied in real-time speech communication in a mobile device. The algorithms are intended to be used in a post-processing block in the receiving device to combat near-end noise in the listener's environment. The target application places tight restrictions on the algorithmic delay, which means that frame-based processing in short time frames (for instance, 10 to 20 ms in length) must be employed. Several algorithms for intelligibility improvement are proposed and their performance is demonstrated with subjective tests using simulated telephone speech. The majority of the introduced algorithms aim to mimic modifications that human speakers naturally employ when talking in noisy situations. In addition, a feature extraction technique that can be used to estimate the spectral tilt caused by the glottal excitation from telephone speech is proposed. Finally, the impact of noisy far-end conditions on post-processing in the receiving device is investigated. In general, the proposed post-processing techniques show clear intelligibility improvement over unprocessed telephone speech, ranging up to a 40 percentage point reduction in word-error rates.
Translated title of the contributionTekniikoita puheen ymmärrettävyyden parantamiseen mobiililaitteissa
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Alku, Paavo, Supervisor
Publisher
Print ISBNs978-952-60-7679-9
Electronic ISBNs978-952-60-7684-3
Publication statusPublished - 2017
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • speech intelligibility enhancement
  • telephone speech
  • near-end noise
  • human speech production

Fingerprint Dive into the research topics of 'Techniques for speech intelligibility enhancement in mobile telephony'. Together they form a unique fingerprint.

  • Cite this