Pre-trained Models for Detection and Severity Level Classification of Dysarthria from Speech

Farhad Javanmardi, Sudarsana Kadiri, Paavo Alku

Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

1 Lataukset (Pure)

Abstrakti

Automatic detection and severity level classification of dysarthria from speech enables noninvasive and effective diagnosis that helps clinical decisions about medication and therapy of patients. In this work, three pre-trained models (wav2vec2-BASE, wav2vec2-LARGE, and HuBERT) are studied to extract features to build automatic detection and severity level classification systems for dysarthric speech. The experiments were conducted using two publicly available databases (UA-Speech and TORGO). One machine learning-based model (support vector machine, SVM) and one deep learning-based model (convolutional neural network, CNN) was used as the classifier. In order to compare the performance of the wav2vec2-BASE, wav2vec2-LARGE, and HuBERT features, three popular acoustic feature sets, namely, mel-frequency cepstral coefficients (MFCCs), openSMILE and extended Geneva minimalistic acoustic parameter set (eGeMAPS) were considered. Experimental results revealed that the features derived from the
pre-trained models outperformed the three baseline features. It was also found that the HuBERT features performed better than the wav2vec2-BASE and wav2vec2-LARGE features. In particular, when compared to the best-performing baseline feature (openSMILE), the HuBERT features showed in the detection problem absolute accuracy improvements that varied between 1.33% (the SVM classifier, the TORGO database) and 2.86% (the SVM classifier, the UA-Speech database). In the severity level classification problem, the HuBERT features showed absolute accuracy improvements that varied between 6.54% (the SVM classifier, the TORGO database) and 10.46% (the SVM classifier, the UA-Speech database) compared to the best-performing baseline feature (eGeMAPS).
AlkuperäiskieliEnglanti
Artikkeli103047
Sivumäärä13
JulkaisuSpeech Communication
Vuosikerta158
DOI - pysyväislinkit
TilaJulkaistu - maalisk. 2024
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Sormenjälki

Sukella tutkimusaiheisiin 'Pre-trained Models for Detection and Severity Level Classification of Dysarthria from Speech'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä