Speech serves a fundamental function in society as one of the main tools enabling conveyance of information from one human to another. The ease with which humans use this tool is deceptive, however, as the physiological process of producing speech sounds is far from simple. Understanding this production process can, for example, bring valuable information to the development of speech and language technology applications and help in diagnosing and treating speech disorders. Obtaining information about speech production can be challenging, however, due to the location of the speech organs and the nature of the physiological processes involved.
This dissertation focuses on the production of one specific aspect of speech, vowels, which are a major component of all spoken languages. Two methodologies, computational physical modelling and glottal inverse filtering (GIF), are used here to investigate vowel production phenomena. Computational speech production models enable simulation of vowel production with virtually complete control of all variables of interest which is not possible with human speakers. In contrast, GIF offers a tool to investigate the natural vowel production process.
Both physical modelling and GIF benefit from the utilisation of multichannel data of natural speech. In this dissertation, two multichannel datasets were collected. The first dataset comprises simultaneously acquired speech pressure signals and magnetic resonance imaging (MRI) data of the vocal tract (VT). The second dataset consists of speech pressure signals recorded simultaneously with high-speed videoendoscopy (HSV) of the vocal folds and electroglottography (EGG). The two datasets were utilised together with computational physical modelling and GIF in order to investigate two dynamic phenomena: the onset of phonation in vowel utterances and fundamental frequency glides, which are vowel utterances in which the fundamental frequency increases or decreases over time.
The results of using HSV and GIF together to analyse onsets indicate that the amplitudes of glot-tal area and flow are closely correlated during phonation initiation despite the presence of non-linear processes and possible dynamic control by the speakers. Simulations of fundamental fre-quency glides with a computational physics model utilising MRI data for VT modelling reveal that the perturbations occurring when the fundamental frequency crosses a resonance of the VT follow distinct patterns. In addition to these two dynamic phenomena, HSV, GIF, and computational physical modelling were also used to study the relationship between the glottal area and flow during steady phonation. These investigations show how different elements of phonation and articulation contribute to the complex process of vowel production, and it would not be possible to obtain this information without novel combinations of data acquisition, analysis methods, and modelling.
|Publication status||Published - 2019|
|MoE publication type||G5 Doctoral dissertation (article)|
- vowel production, physical models, glottal inverse filtering, vowel onsets, fundamental frequency glides