Towards glottal source controllability in expressive speech synthesis

Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, Juan Montero

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

9 Citations (Scopus)

Abstract

In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.
Original languageEnglish
Title of host publicationInterspeech 2012, Portland, Oregon, USA, Sept. 9-13, 2012
Pages1618-1621
Publication statusPublished - 2012
MoE publication typeA4 Article in a conference publication

Keywords

  • speech

Fingerprint Dive into the research topics of 'Towards glottal source controllability in expressive speech synthesis'. Together they form a unique fingerprint.

Cite this