Classification of functional dysphonia using the tunable Q wavelet transform

Kiran Mittapalle*, Madhu Yagnavajjula, Paavo Alku

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

24 Downloads (Pure)


Functional dysphonia (FD) refers to an abnormality in voice quality in the absence of an identifiable lesion. In this paper, we propose an approach based on the tunable Q wavelet transform (TQWT) to automatically classify two types of FD (hyperfunctional dysphonia and hypofunctional dysphonia) from a healthy voice using the acoustic voice signal. Using TQWT, voice signals were decomposed into sub-bands and the entropy values extracted from the sub-bands were utilized as features for the studied 3-class classification problem. In addition, the Mel-frequency cepstral coefficient (MFCC) and glottal features were extracted from the acoustic voice signal and the estimated glottal source signal, respectively. A convolutional neural network (CNN) classifier was trained separately for the TQWT, MFCC and glottal features. Experiments were conducted using voice signals of 57 healthy speakers and 113 FD patients (72 with hyperfunctional dysphonia and 41 with hypofunctional dysphonia) taken from the VOICED database. These experiments revealed that the TQWT features yielded an absolute improvement of 5.5% and 4.5% compared to the baseline MFCC features and glottal features, respectively. Furthermore, the highest classification accuracy (67.91%) was obtained
using the combination of the TQWT and glottal features, which indicates the complementary nature of these features.
Original languageEnglish
Article number102989
Number of pages9
JournalSpeech Communication
Early online date6 Oct 2023
Publication statusPublished - Nov 2023
MoE publication typeA1 Journal article-refereed


  • Functional dysphonia
  • tunable Q wavelet transform
  • glottal features
  • MFCC
  • convolutional neural network


Dive into the research topics of 'Classification of functional dysphonia using the tunable Q wavelet transform'. Together they form a unique fingerprint.

Cite this