Projects per year
Abstract
Lombard speech is a speaking style associated with increased vocal effort that is naturally used by humans to improve intelligibility in the presence of noise. It is hence desirable to have a system capable of converting speech from normal to Lombard style. Moreover, it would be useful if one could adjust the degree of Lombardness in the converted speech so that the system is more adaptable to different noise environments. In this study, we propose the use of recently developed Augmented cycle-consistent adversarial networks (Augmented CycleGANs) for conversion between normal and Lombard speaking styles. The proposed system gives a smooth control on the degree of Lombardness of the mapped utterances by traversing through different points in the latent space of the trained model. We utilize a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract features from normal speech that are then mapped to Lombard-style features using the Augmented CycleGAN. Finally, the mapped features are converted to Lombard speech with PML. The model is trained on multi-language data recorded in different noise conditions, and we compare its effectiveness to a previously proposed CycleGAN system in experiments for intelligibility and quality of mapped speech.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech |
Publisher | International Speech Communication Association |
Pages | 2838-2842 |
DOIs | |
Publication status | Published - 2019 |
MoE publication type | A4 Article in a conference publication |
Event | Interspeech - Graz, Austria Duration: 15 Sept 2019 → 19 Sept 2019 https://www.interspeech2019.org/ |
Publication series
Name | Interspeech - Annual Conference of the International Speech Communication Association |
---|---|
ISSN (Electronic) | 2308-457X |
Conference
Conference | Interspeech |
---|---|
Country/Territory | Austria |
City | Graz |
Period | 15/09/2019 → 19/09/2019 |
Internet address |
Keywords
- Augmented CycleGAN
- Lombard speech
- Pulse-model in log domain vocoder
- Style conversion
- Vocal effort
Fingerprint
Dive into the research topics of 'Augmented CycleGANs for continuous scale normal-to-Lombard speaking style conversion'. Together they form a unique fingerprint.-
Computational basis of contextually grounded language acquisition in humans and machines
Räsänen, O.
31/12/2017 → 31/08/2023
Project: Academy of Finland: Other research funding
-
Interdisciplinary research on statistical parametric speech synthesis
Alku, P., Nonavinakere Prabhakera, N., Bollepalli, B., Bäckström, T., Murtola, T., Airaksinen, M. & Juvela, L.
01/01/2018 → 31/12/2019
Project: Academy of Finland: Other research funding
-
ACLEW: Analyzing Child Language Experiences Around the World
Räsänen, O. & Seshadri, S.
01/06/2017 → 31/05/2020
Project: Academy of Finland: Other research funding