Abstract
Previous studies on the automatic classification of voice disorders have mostly investigated
the binary classification task, which aims to distinguish pathological voice from healthy voice. Using
multi-class classifiers, however, more fine-grained identification of voice disorders can be achieved, which
is more helpful for clinical practitioners. Unfortunately, there is little publicly available training data for
many voice disorders, which lowers the classification performance on data from unseen speakers. Earlier
studies have shown that the usage of glottal source features can reduce data redundancy in detection of
laryngeal voice disorders. Another approach to tackle the problems caused by scarcity of training data
is to utilize deep learning models, such as wav2vec 2.0 and HuBERT, that have been pre-trained on
larger databases. Since the aforementioned approaches have not been thoroughly studied in the multi-class classification of voice disorders, they will be jointly studied in the present work. In addition, we study a hierarchical classifier, which enables task-wise feature optimization and more efficient utilization of data. In this work, the aforementioned three approaches are compared with traditional mel frequency cepstral coefficient (MFCC) features and one-vs-rest and one-vs-one SVM classifiers. The results in a 3-class classification problem between healthy voice and two laryngeal disorders (hyperfunctional dysphonia and vocal fold paresis) indicate that all the studied methods outperform the baselines. The best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification. The
balanced classification accuracy of the system was 62.77% for male speakers, and 55.36% for female speakers, which outperformed the baseline systems by an absolute improvement of 15.76% and 6.95% for male and female speakers, respectively.
the binary classification task, which aims to distinguish pathological voice from healthy voice. Using
multi-class classifiers, however, more fine-grained identification of voice disorders can be achieved, which
is more helpful for clinical practitioners. Unfortunately, there is little publicly available training data for
many voice disorders, which lowers the classification performance on data from unseen speakers. Earlier
studies have shown that the usage of glottal source features can reduce data redundancy in detection of
laryngeal voice disorders. Another approach to tackle the problems caused by scarcity of training data
is to utilize deep learning models, such as wav2vec 2.0 and HuBERT, that have been pre-trained on
larger databases. Since the aforementioned approaches have not been thoroughly studied in the multi-class classification of voice disorders, they will be jointly studied in the present work. In addition, we study a hierarchical classifier, which enables task-wise feature optimization and more efficient utilization of data. In this work, the aforementioned three approaches are compared with traditional mel frequency cepstral coefficient (MFCC) features and one-vs-rest and one-vs-one SVM classifiers. The results in a 3-class classification problem between healthy voice and two laryngeal disorders (hyperfunctional dysphonia and vocal fold paresis) indicate that all the studied methods outperform the baselines. The best performance was achieved by using features from wav2vec 2.0 LARGE together with hierarchical classification. The
balanced classification accuracy of the system was 62.77% for male speakers, and 55.36% for female speakers, which outperformed the baseline systems by an absolute improvement of 15.76% and 6.95% for male and female speakers, respectively.
Original language | English |
---|---|
Pages (from-to) | 80-88 |
Number of pages | 9 |
Journal | IEEE Open journal of Signal Processing |
Volume | 4 |
DOIs | |
Publication status | Published - 6 Feb 2023 |
MoE publication type | A1 Journal article-refereed |
Fingerprint
Dive into the research topics of 'Hierarchical Multi-class Classification of Voice Disorders Using Self-supervised Models and Glottal Features'. Together they form a unique fingerprint.Press/Media
-
Data on Voice Disorders Discussed by Researchers at Aalto University (Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features)
09/03/2023
1 item of Media coverage
Press/Media: Media appearance