Abstrakti
Speech is the most natural form of human communication that not only conveys a message in the form of language, but also transfers information about speaker attributes such as emotions, age, and state of health. Systems that automatically classify this para- and extra-linguistic information have received considerable attention in speech technology. One area of this research is the study of automatic classification of voice disorders from speech. This area has potential applications in voice care as it provides objective and cost-effective analysis tools compared to subjective and time- consuming auditory-perceptual assessments currently performed by clinicians. This thesis investigates the effectiveness of various feature extraction methods and supervised machine learning (ML) and deep learning (DL) techniques in the development of automatic speech-based classification systems for two tasks: (1) classification of voice disorders and (2) classification of phonation types. Multiple classification systems are built, each designed to address a specific topic. The first topic is the study of systems to detect voice disorders in various laryngeal diseases from sustained phonation of vowels. Moreover, the application of data augmentation (DA) is studied in this topic. As the second topic, the thesis investigates the detection of dysarthria and the multi-class classification of the severity level of dysarthria from speech. As the third topic, the thesis investigates the classification of three phonation types (breathy, modal, and pressed). In the classification of voice disorders, the use of convolutional neural networks (CNNs) with 2- dimensional spectral feature representations achieved better performance than 1-dimensional features. Moreover, the use of DA methods in the system training showed absolute accuracy improvements of up to about 4%. In the classification of dysarthria, the use of pre-trained model- based features showed in the best cases absolute accuracy improvements of about 10% compared to conventional features. Furthermore, fine-tuning the pre-trained models resulted in features with better generalization capabilities in dysarthria detection. In the classification of phonation types, the use of the neck surface accelerometer (NSA) signal showed better classification performance compared to the speech signal. In addition, pre-trained model-based features outperformed the conventional features for both speech and NSA signals. In conclusion, this thesis resulted in improvements in automatic, speech-based classification of voice disorders by combining ML and DL classifiers with spectral and pre-trained model-based features and by taking advantage of NSA in the classification of phonation types.
Julkaisun otsikon käännös | Automatic Classification of Voice Disorders and Phonation Types from Speech Signals |
---|---|
Alkuperäiskieli | Englanti |
Pätevyys | Tohtorintutkinto |
Myöntävä instituutio |
|
Valvoja/neuvonantaja |
|
Kustantaja | |
Painoksen ISBN | 978-952-64-2191-9 |
Sähköinen ISBN | 978-952-64-2192-6 |
Tila | Julkaistu - 2024 |
OKM-julkaisutyyppi | G5 Artikkeliväitöskirja |