Abstract
GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association |
Subtitle of host publication | Interspeech'16, San Francisco, USA, Sept. 8-12, 2016 |
Publisher | International Speech Communication Association |
Pages | 2473-2477 |
Number of pages | 5 |
Volume | 08-12-September-2016 |
ISBN (Electronic) | 978-1-5108-3313-5 |
DOIs | |
Publication status | Published - 2016 |
MoE publication type | A4 Article in a conference publication |
Event | Interspeech - San Francisco, United States Duration: 8 Sep 2016 → 12 Sep 2016 Conference number: 17 |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association |
---|---|
Publisher | International Speech Communications Association |
ISSN (Print) | 1990-9770 |
ISSN (Electronic) | 2308-457X |
Conference
Conference | Interspeech |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 08/09/2016 → 12/09/2016 |
Keywords
- Deep neural network
- Glottal inverse filtering
- Speech synthesis
- Vocoder