A multichannel dataset comprising high-speed videoendoscopy images, and electroglottography and free-ﬁeld microphone signals, was used to investigate phonation onsets in vowel production. Use of the multichannel data enabled simultaneous analysis of the two main aspects of phonation, glottal area, extracted from the high-speed videoendoscopy images, and glottal ﬂow, estimated from the microphone signal using glottal inverse ﬁltering. Pulse-wise parameterization of the glottal area and glottal ﬂow indicate that there is no single dominant way to initiate quasi-stable phonation. The trajectories of fundamental frequency and normalized amplitude quotient, extracted from glottal area and estimated ﬂow, may diﬀer markedly during onsets. The location and steepness of the amplitude envelopes of the two signals were observed to be closely related, and quantitative analysis supported the hypothesis that glottal area and ﬂow do not carry essentially diﬀerent amplitude information during vowel onsets. Linear models were used to predict the phonation onset times from the characteristics of the subsequent steady phonation. The phonation onset time of glottal area was found to have good predictability from a combination of the fundamental frequency and the normalized amplitude quotient of the glottal ﬂow, as well as the gender of the speaker. For the phonation onset time of glottal ﬂow, the best linear model was obtained using the fundamental frequency and the normalized amplitude quotient of the glottal ﬂow as predictors.
Murtola, T., Malinen, J., Geneid, A., & Alku, P. (2019). Analysis of phonation onsets in vowel production, using information from glottal area and flow estimate. Speech Communication, 109, 55-65. https://doi.org/10.1016/j.specom.2019.03.007