A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

36 Sitaatiot (Scopus)

Abstrakti

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

AlkuperäiskieliEnglanti
Otsikko2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
JulkaisupaikkaUnited States
KustantajaIEEE
Sivut4804-4808
Sivumäärä5
Vuosikerta2018-April
ISBN (elektroninen)978-1-5386-4658-8
ISBN (painettu)978-1-5386-4659-5
DOI - pysyväislinkit
TilaJulkaistu - 10 syyskuuta 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary, Kanada
Kesto: 15 huhtikuuta 201820 huhtikuuta 2018
https://2018.ieeeicassp.org/

Julkaisusarja

NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP
MaaKanada
KaupunkiCalgary
Ajanjakso15/04/201820/04/2018
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä