Abstract
GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association |
| Subtitle of host publication | Interspeech'16, San Francisco, USA, Sept. 8-12, 2016 |
| Publisher | International Speech Communication Association (ISCA) |
| Pages | 2473-2477 |
| Number of pages | 5 |
| Volume | 08-12-September-2016 |
| ISBN (Electronic) | 978-1-5108-3313-5 |
| DOIs | |
| Publication status | Published - 2016 |
| MoE publication type | A4 Conference publication |
| Event | Interspeech - San Francisco, United States Duration: 8 Sept 2016 → 12 Sept 2016 Conference number: 17 |
Publication series
| Name | Proceedings of the Annual Conference of the International Speech Communication Association |
|---|---|
| Publisher | International Speech Communications Association |
| ISSN (Print) | 1990-9770 |
| ISSN (Electronic) | 2308-457X |
Conference
| Conference | Interspeech |
|---|---|
| Country/Territory | United States |
| City | San Francisco |
| Period | 08/09/2016 → 12/09/2016 |
Keywords
- Deep neural network
- Glottal inverse filtering
- Speech synthesis
- Vocoder