Projects per year
Abstract
The state-of-the-art in text-to-speech (TTS) synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. However, these methods suffer from their slow sequential inference process, while their parallel versions are difficult to train and even more computationally expensive. Meanwhile, generative adversarial networks (GANs) have achieved impressive results in image generation and are making their way into audio applications; parallel inference is among their lucrative properties. By adopting recent advances in GAN training techniques, this investigation studies waveform generation for TTS in two domains (speech signal and glottal excitation). Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet vocoder.
Original language | English |
---|---|
Title of host publication | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | IEEE |
Pages | 6915 - 6919 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-4799-8131-1 |
ISBN (Print) | 978-1-4799-8132-8 |
DOIs | |
Publication status | Published - 1 May 2019 |
MoE publication type | A4 Conference publication |
Event | IEEE International Conference on Acoustics, Speech, and Signal Processing - Brighton, United Kingdom Duration: 12 May 2019 → 17 May 2019 Conference number: 44 |
Publication series
Name | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
ISSN (Print) | 1520-6149 |
ISSN (Electronic) | 2379-190X |
Conference
Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP |
Country/Territory | United Kingdom |
City | Brighton |
Period | 12/05/2019 → 17/05/2019 |
Keywords
- Neural vocoding
- text-to-speech
- GAN
- glottal excitation model
Fingerprint
Dive into the research topics of 'Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Interdisciplinary research on statistical parametric speech synthesis
Alku, P. (Principal investigator), Bäckström, T. (Project Member), Juvela, L. (Project Member), Murtola, T. (Project Member), Nonavinakere Prabhakera, N. (Project Member), Bollepalli, B. (Project Member) & Airaksinen, M. (Project Member)
01/01/2018 → 31/12/2019
Project: Academy of Finland: Other research funding