- Japan National Institute of Information and Communications Technology
- University of Arizona
The aim of this study is to comparatively review and evaluate three variants of the glottal inverse filtering algorithm based on iterative adaptive inverse filtering (IAIF): the Standard algorithm, and two recently proposed variants that use iterative optimal preemphasis (IOP) and a glottal flow model (GFM), respectively. To enable an objective evaluation, a computational physical model of voice production is used to generate time-domain signals pertaining to both the input glottal flow and the output speech pressure, for a wide range of vowels, fundamental frequencies, and voice qualities (involving co-variation of phonation type and loudness). Furthermore, for a fair comparison, the three key parameters of IAIF are selected by an exhaustive search to minimize the root-mean-square error between the estimated and reference glottal flow derivative in each analyzed frame and performance is assessed with two time-domain and two frequency-domain error measures. A conventional evaluation is also carried out with fixed parameter values determined by cross-validation. Results indicate that IOP tends to yield the lowest errors for nonback vowels (reducing errors by 31% on average compared with Standard), especially for not too high fundamental frequencies and not too pressed voice qualities; GFM becomes competitive for normal phonations when fixed parameter values are used; and in other cases, Standard IAIF is still recommended. In addition, the results suggest that not only the overall spectral tilt (as controlled by IOP and GFM) but also the balance between the levels of different spectral regions, can be important for accurate estimation of the glottal flow.
|Tila||Julkaistu - 2018|
|OKM-julkaisutyyppi||A1 Julkaistu artikkeli, soviteltu|