Extraction and Utilization of Excitation Information of Speech: A Review

Sudarsana Kadiri, Paavo Alku, Bayya Yegnanarayana

Research output: Contribution to journalReview ArticleScientificpeer-review

Abstract

Speech production can be regarded as a process where a time-varying vocal tract system (filter) is excited by a time-varying excitation. In addition to its linguistic message, the speech signal also carries information about, for example, the gender and age of the speaker. Moreover, the speech signal includes acoustical cues about several speaker traits, such as the emotional state and the state of health of the speaker. In order to understand the production of these acoustical cues by the human speech production mechanism and to utilize this information in speech technology, it is necessary to extract features describing both the excitation and the filter of the human speech production mechanism. While the methods to estimate and parameterize the vocal tract system are well established, the excitation appears less studied. This article provides a review
of signal processing approaches used for extraction of excitation information from speech. The paper highlights the importance of excitation information in analysis and classification of phonation type and vocal emotions, in the analysis of nonverbal laughter sounds as well as in studying pathological voices. Furthermore, recent developments of deep learning techniques in the context
of extraction and utilization of the excitation information are discussed.
Original languageEnglish
Number of pages28
JournalProceedings of the IEEE
Publication statusAccepted/In press - 2021
MoE publication typeA2 Review article in a scientific journal

Keywords

  • Speech analysis
  • speech excitation
  • glottal closure instant (GCI)
  • glottal opening instant (GOI)
  • glottal inverse filtering
  • phonation type
  • emotional speech
  • nonverbal sounds
  • pathological voices
  • deep learning

Fingerprint

Dive into the research topics of 'Extraction and Utilization of Excitation Information of Speech: A Review'. Together they form a unique fingerprint.

Cite this