Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

35 Downloads (Pure)

Abstract

Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for modeling, but present additional challenges for vocoding (i.e., waveform generation from the acoustic features). High-quality synthesis can be achieved with neural vocoders, such as WaveNet, but such autoregressive models suffer from slow sequential inference. Meanwhile, their existing parallel inference counterparts are difficult to train and require increasingly large model sizes. In this paper, we propose an alternative training strategy for a parallel neural vocoder utilizing generative adversarial networks, and integrate a linear predictive synthesis filter into the model. Results show that the proposed model achieves significant improvement in inference speed, while outperforming a WaveNet in copy-synthesis quality.

Original languageEnglish
Title of host publicationProceedings of Interspeech
PublisherInternational Speech Communication Association
Pages694-698
Number of pages5
Volume2019-September
DOIs
Publication statusPublished - 1 Jan 2019
MoE publication typeA4 Article in a conference publication
EventInterspeech - Graz, Austria
Duration: 15 Sep 201919 Sep 2019
https://www.interspeech2019.org/

Publication series

NameInterspeech - Annual Conference of the International Speech Communication Association
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech
CountryAustria
CityGraz
Period15/09/201919/09/2019
Internet address

Keywords

  • GAN
  • Neural vocoder
  • Source-filter model
  • WaveNet

Fingerprint Dive into the research topics of 'Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram'. Together they form a unique fingerprint.

  • Projects

    Equipment

    Science-IT

    Mikko Hakala (Manager)

    School of Science

    Facility/equipment: Facility

  • Cite this

    Juvela, L., Bollepalli, B., Yamagishi, J., & Alku, P. (2019). Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram. In Proceedings of Interspeech (Vol. 2019-September, pp. 694-698). (Interspeech - Annual Conference of the International Speech Communication Association). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2019-2008