Abstract
Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.
Original language | English |
---|---|
Title of host publication | 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings |
Place of Publication | United States |
Publisher | IEEE |
Pages | 4804-4808 |
Number of pages | 5 |
Volume | 2018-April |
ISBN (Electronic) | 978-1-5386-4658-8 |
ISBN (Print) | 978-1-5386-4659-5 |
DOIs | |
Publication status | Published - 10 Sept 2018 |
MoE publication type | A4 Article in a conference publication |
Event | IEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary, Canada Duration: 15 Apr 2018 → 20 Apr 2018 https://2018.ieeeicassp.org/ |
Publication series
Name | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
ISSN (Electronic) | 2379-190X |
Conference
Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP |
Country/Territory | Canada |
City | Calgary |
Period | 15/04/2018 → 20/04/2018 |
Internet address |
Keywords
- Autoregressive neural network
- Deep learning
- General adversarial network
- Speech synthesis
- Wavenet