Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

2 Sitaatiot (Scopus)
121 Lataukset (Pure)

Abstrakti

The state-of-the-art in text-to-speech (TTS) synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. However, these methods suffer from their slow sequential inference process, while their parallel versions are difficult to train and even more computationally expensive. Meanwhile, generative adversarial networks (GANs) have achieved impressive results in image generation and are making their way into audio applications; parallel inference is among their lucrative properties. By adopting recent advances in GAN training techniques, this investigation studies waveform generation for TTS in two domains (speech signal and glottal excitation). Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet vocoder.
AlkuperäiskieliEnglanti
OtsikkoICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
KustantajaIEEE
Sivut6915 - 6919
Sivumäärä5
ISBN (elektroninen)978-1-4799-8131-1
ISBN (painettu)978-1-4799-8132-8
DOI - pysyväislinkit
TilaJulkaistu - 1 toukokuuta 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Brighton, Iso-Britannia
Kesto: 12 toukokuuta 201917 toukokuuta 2019
Konferenssinumero: 44

Julkaisusarja

NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (painettu)1520-6149
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP
MaaIso-Britannia
KaupunkiBrighton
Ajanjakso12/05/201917/05/2019

Sormenjälki Sukella tutkimusaiheisiin 'Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

  • Projektit

    • 1 Päättynyt

    Poikkitieteellinen parametrisen puhesynteesin tutkimusprojekti

    Murtola, T., Bollepalli, B., Nonavinakere Prabhakera, N., Juvela, L., Airaksinen, M., Bäckström, T. & Alku, P.

    01/01/201824/01/2020

    Projekti: Academy of Finland: Other research funding

    Laitteet

    Science-IT

    Mikko Hakala (Manager)

    Perustieteiden korkeakoulu

    Laitteistot/tilat: Facility

  • Siteeraa tätä

    Juvela, L., Bollepalli, B., Yamagishi, J., & Alku, P. (2019). Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks. teoksessa ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Sivut 6915 - 6919). [8683271] (Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing). IEEE. https://doi.org/10.1109/ICASSP.2019.8683271