Extraction and Utilization of Excitation Information of Speech: A Review

Sudarsana Kadiri, Paavo Alku, Bayya Yegnanarayana

Research output: Contribution to journalReview Articlepeer-review

3 Citations (Scopus)
76 Downloads (Pure)

Abstract

Speech production can be regarded as a process where a time-varying vocal tract system (filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech signal also carries information about, for example, the gender and age of the speaker. Moreover, the speech signal includes acoustical cues about several speaker traits, such as the emotional state and the state of health of the speaker. In order to understand the production of these acoustical cues by the human speech production mechanism and utilize this information in speech technology, it is necessary to extract features describing both the excitation and the filter of the human speech production mechanism. While the methods to estimate and parameterize the vocal tract system are well established, the excitation appears less studied. This article provides a review of signal processing approaches used for the extraction of excitation information from speech. This article highlights the importance of excitation information in the analysis and classification of phonation type and vocal emotions, in the analysis of nonverbal laughter sounds, and in studying pathological voices. Furthermore, recent developments of deep learning techniques in the context of extraction and utilization of the excitation information are discussed.

Original languageEnglish
Pages (from-to)1920 - 1941
Number of pages22
JournalProceedings of the IEEE
Volume109
Issue number12
Early online date29 Nov 2021
DOIs
Publication statusPublished - Dec 2021
MoE publication typeA2 Review article in a scientific journal

Keywords

  • Speech analysis
  • speech excitation
  • glottal closure instant (GCI)
  • glottal opening instant (GOI)
  • glottal inverse filtering
  • phonation type
  • emotional speech
  • nonverbal sounds
  • pathological voices
  • deep learning

Fingerprint

Dive into the research topics of 'Extraction and Utilization of Excitation Information of Speech: A Review'. Together they form a unique fingerprint.

Cite this