GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

34 Sitaatiot (Scopus)

Abstrakti

GlottHMM is a previously developed vocoder that has been successfully used in HMM-based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according to the functioning of the real human voice production mechanism. In this study, a new glottal vocoding method, GlottDNN, is proposed. The GlottDNN vocoder is built on the principles of its predecessor, GlottHMM, but the new vocoder introduces three main improvements: GlottDNN (1) takes advantage of a new, more accurate glottal inverse filtering method, (2) uses a new method of deep neural network (DNN) -based glottal excitation generation, and (3) proposes a new approach of band-wise processing of full-band speech. The proposed GlottDNN vocoder was evaluated as part of a full-band state-of-the-art DNN-based text-to-speech (TTS) synthesis system, and compared against the release version of the original GlottHMM vocoder, and the well-known STRAIGHT vocoder. The results of the subjective listening test indicate that GlottDNN improves the TTS quality over the compared methods.

AlkuperäiskieliEnglanti
OtsikkoProceedings of the Annual Conference of the International Speech Communication Association
AlaotsikkoInterspeech'16, San Francisco, USA, Sept. 8-12, 2016
KustantajaInternational Speech Communication Association (ISCA)
Sivut2473-2477
Sivumäärä5
Vuosikerta08-12-September-2016
ISBN (elektroninen)978-1-5108-3313-5
DOI - pysyväislinkit
TilaJulkaistu - 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInterspeech - San Francisco, Yhdysvallat
Kesto: 8 syysk. 201612 syysk. 2016
Konferenssinumero: 17

Julkaisusarja

NimiProceedings of the Annual Conference of the International Speech Communication Association
KustantajaInternational Speech Communications Association
ISSN (painettu)1990-9770
ISSN (elektroninen)2308-457X

Conference

ConferenceInterspeech
Maa/AlueYhdysvallat
KaupunkiSan Francisco
Ajanjakso08/09/201612/09/2016

Sormenjälki

Sukella tutkimusaiheisiin 'GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä