Interdisciplinary research on statistical parametric speech synthesis

Project Details


An interdisciplinary research project is proposed to develop statistical text-to-speech synthesis (TTS) technologies. We will focus on the core of the statistical TTS, the vocoder, which is the parametric block generating synthetic speech. We will search for completely new vocoding techniques based on a physiologically motivated modelling approach. The models studied utilize glottal inverse filtering (GIF), a computational method to separate speech into the glottal excitation and the vocal tract. The project aims particularly at new automatic GIF-based vocoders that outperform the current methods especially in parameterization of challenging data, such as female or child speech. The vocoders developed will be integrated into synthesis platforms to generate speech from arbitrary texts. The project is expected to improve the naturalness of spoken interaction systems hence having many potential ICT-related applications (e.g., speech-to-speech translation and assistive technology).
Short titleAproTEAM 2018-2019
Effective start/end date01/01/201831/12/2019