Analysis of Instantaneous Frequency Components of Speech Signals for Epoch Extraction

Sudarsana Kadiri, Paavo Alku, Bayya Yegnanarayana

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The major impulse-like excitation in the speech signal is due to abrupt closure of the vocal folds, which takes place at the glottal closure instant (GCI) or epoch in each cycle. GCIs are used in many areas of speech science and technology, such as in prosody modification, voice source analysis, formant extraction and speech synthesis. It is difficult to observe these discontinuities (corresponding to GCIs) in the speech signal because of the superimposed time-varying response of the
vocal tract system. This paper examines the phase part of different frequency components of the speech signal to extract epochs. Three analysis methods to decompose the speech signal into different frequency components are considered. These methods are the short-time Fourier transform (STFT), narrow bandpass filtering (NBPF), and single frequency filtering (SFF). The locations of the discontinuities in the speech signal are obtained from the instantaneous frequency (IF) (i.e., the time derivative of the phase) of each of the frequency components. A method for automatic detection of epochs using the amplitude weighted IF is proposed. Performance of the proposed epoch detection method is compared with four state-of-the-art methods in clean and telephone quality speech. The performance of the proposed method is comparable with the performance of the existing epoch detection methods for clean speech but better for telephone quality speech.
Original languageEnglish
Article number101443
Number of pages14
JournalComputer Speech and Language
Volume78
DOIs
Publication statusAccepted/In press - 2022
MoE publication typeA1 Journal article-refereed

Keywords

  • speech analysis
  • phase processing
  • instantaneous frequency
  • group delay
  • excitation source
  • glottal closure instants
  • epochs

Fingerprint

Dive into the research topics of 'Analysis of Instantaneous Frequency Components of Speech Signals for Epoch Extraction'. Together they form a unique fingerprint.

Cite this