High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network

Lauri Juvela, Bajibabu Bollepalli, Manu Airaksinen, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

28 Sitaatiot (Scopus)

Abstrakti

Achieving high quality and naturalness in statistical parametric synthesis of female voices remains to be difficult despite recent advances in the study area. Vocoding is one such key element in all statistical speech synthesizers that is known to affect the synthesis quality and naturalness. The present study focuses on a special type of vocoding, glottal vocoders, which aim to parameterize speech based on modelling the real excitation of (voiced) speech, the glottal flow. More specifically, we compare three different glottal vocoders by aiming at improved synthesis naturalness of female voices. Two of the vocoders are previously known, both utilizing an old glottal inverse filtering (GIF) method in estimating the glottal flow. The third on, denoted as Quasi Closed Phase - Deep Neural Net (QCP-DNN), takes advantage of a recently proposed new GIF method that shows improved accuracy in estimating the glottal flow from high-pitched speech. Subjective listening tests conducted on an US English female voice show that the proposed QCP-DNN method gives significant improvement in synthetic naturalness compared to the two previously developed glottal vocoders.

AlkuperäiskieliEnglanti
OtsikkoIEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
AlaotsikkoProceedings
KustantajaIEEE
Sivut5120-5124
Sivumäärä5
Vuosikerta2016-May
ISBN (painettu)9781479999880
DOI - pysyväislinkit
TilaJulkaistu - 18 toukok. 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Shanghai, Kiina
Kesto: 20 maalisk. 201625 maalisk. 2016
Konferenssinumero: 41
http://www.icassp2016.org/

Julkaisusarja

NimiProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
KustantajaInstitute of Electrical and Electronics Engineers Inc.
ISSN (painettu)1520-6149
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP 2016
Maa/AlueKiina
KaupunkiShanghai
Ajanjakso20/03/201625/03/2016
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä