An interdisciplinary research project is proposed to develop statistical text-to-speech synthesis (TTS) technologies. We will focus on the
core of TTS, the vocoder, which is the parametric block generating synthetic speech. We will search for completely new vocoding techniques based on a physiologically motivated modelling approach. The models studied utilize glottal inverse filtering (GIF), a computational method to separate speech into the glottal excitation and the vocal tract.
The project aims particularly at new automatic GIF-based vocoders that outperform the current methods especially in parameterization
of challenging data, such as female or child speech. The vocoders developed will be integrated into synthesis platforms to generate
speech from arbitrary texts. The project is expected to improve the naturalness of spoken interaction systems hence having many
potential ICT-related applications (e.g., speech-to-speech translation and assistive technology).