Today's consumers can use their mobile telephony devices almost anywhere and at any time. This means that speech communication is often disturbed by environmental background noise, making it hard for the listener to understand what the speaker is saying. To further aggravate the situation, the listener and the speaker are typically in different locations when the communication is taking place. This means that without listener feedback the speaker is unable to adjust his or her speaking style to fit the listening environment, as is normally done in face-to-face communication situations. However, speech communication by mobile telephony in noisy conditions can be improved using intelligibility enhancement technology. This thesis contributes to the development of intelligibility enhancement techniques that can in principle be applied in real-time speech communication in a mobile device. The algorithms are intended to be used in a post-processing block in the receiving device to combat near-end noise in the listener's environment. The target application places tight restrictions on the algorithmic delay, which means that frame-based processing in short time frames (for instance, 10 to 20 ms in length) must be employed. Several algorithms for intelligibility improvement are proposed and their performance is demonstrated with subjective tests using simulated telephone speech. The majority of the introduced algorithms aim to mimic modifications that human speakers naturally employ when talking in noisy situations. In addition, a feature extraction technique that can be used to estimate the spectral tilt caused by the glottal excitation from telephone speech is proposed. Finally, the impact of noisy far-end conditions on post-processing in the receiving device is investigated. In general, the proposed post-processing techniques show clear intelligibility improvement over unprocessed telephone speech, ranging up to a 40 percentage point reduction in word-error rates.
|Translated title of the contribution||Tekniikoita puheen ymmärrettävyyden parantamiseen mobiililaitteissa|
|Publication status||Published - 2017|
|MoE publication type||G5 Doctoral dissertation (article)|
- speech intelligibility enhancement
- telephone speech
- near-end noise
- human speech production