Tunable Q wavelet transform -based features in the classification of phonation types in the singing and speaking voice

Kiran Mittapalle*, Paavo Alku

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Phonation is the use of the laryngeal system, with the help of an air-stream provided by the respiratory system, to generate audible sounds. Humans are capable of generating voices of various phonation types (e.g., breathy, neutral and pressed), and these types are used both in singing and speaking. In this study, we propose to use features derived using the tunable Q wavelet transform (TQWT) for classification of phonation types in the singing and speaking voice. In the proposed approach, the input voice signal is first decomposed into sub-bands using TQWT, and then the Shannon wavelet entropy of each sub-band is calculated. A Feed forward neural network (FFNN) classifier is trained using the entropy values to discriminate three phonation types (breathy, neutral and pressed). The results show that the proposed TQWT-based features outperformed six state-of-the-art features in classification of phonation types both in the singing and speaking voice. Furthermore, the TQWT features achieved the highest phonation classification accuracies of 91% and 82% for the singing and speaking voice, respectively.
Original languageEnglish
Number of pages7
JournalJournal of Voice
Publication statusAccepted/In press - 2024
MoE publication typeA1 Journal article-refereed

Keywords

  • tunable Q wavelet transform
  • Shannon entropy
  • support vector machine
  • phonation types

Fingerprint

Dive into the research topics of 'Tunable Q wavelet transform -based features in the classification of phonation types in the singing and speaking voice'. Together they form a unique fingerprint.

Cite this