Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions – evaluation of two methods

Research output: Contribution to journalArticleScientificpeer-review

Standard

Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions – evaluation of two methods. / Jokinen, Emma; Pulakka, Hannu; Alku, Paavo.

In: Speech Communication, Vol. 83, 01.10.2016, p. 64-80.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{a622f0202dc34d62b0310c4861abd738,
title = "Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions – evaluation of two methods",
abstract = "In this study, two intelligibility-increasing post-processing methods based on the modification of the phase spectrum of speech are proposed for near-end noise conditions. One of the algorithms aims to reduce the dynamic range of the signal and take advantage of the energy gain resulting from amplitude normalization to increase the loudness, while the other algorithm is designed to sharpen the high-amplitude peaks in the time-domain signal generated by the periodic glottal excitation to make the speech sound more clear. Both methods are based on first modifying only the phase spectrum, after which the time-domain signal is computed using the inverse Fourier transform. Finally, the time-domain signal is amplitude normalized by scaling its sample values so that they occupy the original amplitude range of the processed frame. The performance of the proposed methods was evaluated by first comparing them to unprocessed speech using objective quality measures as well as subjective loudness and listening preference tests. Based on the results of these evaluations, the phase-modification methods were further compared to unprocessed speech and dynamic range compression using subjective word-error rate and quality tests. Both narrowband and wideband speech from several talkers were included in both evaluations. Both of the methods were able to increase loudness in some bandwidth conditions as well as outperform unprocessed speech and dynamic range compression in terms of intelligibility in high-noise levels. Both of the methods were rated lower in quality than unprocessed speech in clean conditions. In background noise, however, where intelligibility enhancement algorithms are mostly used, both methods achieved similar results to unprocessed speech in terms of listening preference in some of the bandwidth conditions tested.",
keywords = "Intelligibility enhancement, Listening effort, Loudness, Phase modification, Telephone speech",
author = "Emma Jokinen and Hannu Pulakka and Paavo Alku",
year = "2016",
month = "10",
day = "1",
doi = "10.1016/j.specom.2016.08.001",
language = "English",
volume = "83",
pages = "64--80",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",

}

RIS - Download

TY - JOUR

T1 - Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions – evaluation of two methods

AU - Jokinen, Emma

AU - Pulakka, Hannu

AU - Alku, Paavo

PY - 2016/10/1

Y1 - 2016/10/1

N2 - In this study, two intelligibility-increasing post-processing methods based on the modification of the phase spectrum of speech are proposed for near-end noise conditions. One of the algorithms aims to reduce the dynamic range of the signal and take advantage of the energy gain resulting from amplitude normalization to increase the loudness, while the other algorithm is designed to sharpen the high-amplitude peaks in the time-domain signal generated by the periodic glottal excitation to make the speech sound more clear. Both methods are based on first modifying only the phase spectrum, after which the time-domain signal is computed using the inverse Fourier transform. Finally, the time-domain signal is amplitude normalized by scaling its sample values so that they occupy the original amplitude range of the processed frame. The performance of the proposed methods was evaluated by first comparing them to unprocessed speech using objective quality measures as well as subjective loudness and listening preference tests. Based on the results of these evaluations, the phase-modification methods were further compared to unprocessed speech and dynamic range compression using subjective word-error rate and quality tests. Both narrowband and wideband speech from several talkers were included in both evaluations. Both of the methods were able to increase loudness in some bandwidth conditions as well as outperform unprocessed speech and dynamic range compression in terms of intelligibility in high-noise levels. Both of the methods were rated lower in quality than unprocessed speech in clean conditions. In background noise, however, where intelligibility enhancement algorithms are mostly used, both methods achieved similar results to unprocessed speech in terms of listening preference in some of the bandwidth conditions tested.

AB - In this study, two intelligibility-increasing post-processing methods based on the modification of the phase spectrum of speech are proposed for near-end noise conditions. One of the algorithms aims to reduce the dynamic range of the signal and take advantage of the energy gain resulting from amplitude normalization to increase the loudness, while the other algorithm is designed to sharpen the high-amplitude peaks in the time-domain signal generated by the periodic glottal excitation to make the speech sound more clear. Both methods are based on first modifying only the phase spectrum, after which the time-domain signal is computed using the inverse Fourier transform. Finally, the time-domain signal is amplitude normalized by scaling its sample values so that they occupy the original amplitude range of the processed frame. The performance of the proposed methods was evaluated by first comparing them to unprocessed speech using objective quality measures as well as subjective loudness and listening preference tests. Based on the results of these evaluations, the phase-modification methods were further compared to unprocessed speech and dynamic range compression using subjective word-error rate and quality tests. Both narrowband and wideband speech from several talkers were included in both evaluations. Both of the methods were able to increase loudness in some bandwidth conditions as well as outperform unprocessed speech and dynamic range compression in terms of intelligibility in high-noise levels. Both of the methods were rated lower in quality than unprocessed speech in clean conditions. In background noise, however, where intelligibility enhancement algorithms are mostly used, both methods achieved similar results to unprocessed speech in terms of listening preference in some of the bandwidth conditions tested.

KW - Intelligibility enhancement

KW - Listening effort

KW - Loudness

KW - Phase modification

KW - Telephone speech

UR - http://www.scopus.com/inward/record.url?scp=84981313499&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2016.08.001

DO - 10.1016/j.specom.2016.08.001

M3 - Article

VL - 83

SP - 64

EP - 80

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -

ID: 6982294