A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

35 Citations (Scopus)

Abstract

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
Place of PublicationUnited States
PublisherIEEE
Pages4804-4808
Number of pages5
Volume2018-April
ISBN (Electronic)978-1-5386-4658-8
ISBN (Print)978-1-5386-4659-5
DOIs
Publication statusPublished - 10 Sep 2018
MoE publication typeA4 Article in a conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018
https://2018.ieeeicassp.org/

Publication series

NameProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
CountryCanada
CityCalgary
Period15/04/201820/04/2018
Internet address

Keywords

  • Autoregressive neural network
  • Deep learning
  • General adversarial network
  • Speech synthesis
  • Wavenet

Fingerprint Dive into the research topics of 'A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis'. Together they form a unique fingerprint.

Cite this