Effects of training data variety in generating glottal pulses from acoustic features with DNNs

Manu Airaksinen, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

2 Sitaatiot (Scopus)
304 Lataukset (Pure)

Abstrakti

Glottal volume velocity waveform, the acoustical excitation of voiced speech, cannot be acquired through direct measurements in normal production of continuous speech. Glottal inverse filtering (GIF), however, can be used to estimate the glottal flow from recorded speech signals. Unfortunately, the usefulness of GIF algorithms is limited since they are sensitive to noise and call for high-quality recordings. Recently, efforts have been taken to expand the use of GIF by training deep neural networks (DNNs) to learn a statistical mapping between frame-level acoustic features and glottal pulses estimated by GIF. This framework has been successfully utilized in statistical speech synthesis in the form of the GlottDNN vocoder which uses a DNN to generate glottal pulses to be used as the synthesizer’s excitation waveform. In this study, we investigate how the DNN-based generation of glottal pulses is affected by training data variety. The evaluation is done using both objective measures as well as subjective listening tests of synthetic speech. The results suggest that the performance of the glottal pulse generation with DNNs is affected particularly by how well the training corpus suits GIF: processing low-pitched male speech and sustained phonations shows better performance than processing high-pitched female voices or continuous speech.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
KustantajaInternational Speech Communication Association (ISCA)
Sivut3946-3950
Sivumäärä5
Vuosikerta2017-August
DOI - pysyväislinkit
TilaJulkaistu - elok. 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInterspeech - Stockholm, Ruotsi
Kesto: 20 elok. 201724 elok. 2017
Konferenssinumero: 18
http://www.interspeech2017.org/

Julkaisusarja

NimiInterspeech: Annual Conference of the International Speech Communication Association
ISSN (elektroninen)1990-9772

Conference

ConferenceInterspeech
Maa/AlueRuotsi
KaupunkiStockholm
Ajanjakso20/08/201724/08/2017
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'Effects of training data variety in generating glottal pulses from acoustic features with DNNs'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä