GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.


OtsikkoProceedings of the Annual Conference of the International Speech Communication Association
AlaotsikkoInterspeech'16, San Francisco, USA, Sept. 8-12, 2016
TilaJulkaistu - 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInterspeech - San Francisco, Yhdysvallat
Kesto: 8 syyskuuta 201612 syyskuuta 2016
Konferenssinumero: 17


NimiProceedings of the Annual Conference of the International Speech Communication Association
KustantajaInternational Speech Communications Association
ISSN (painettu)1990-9770
ISSN (elektroninen)2308-457X


KaupunkiSan Francisco

