Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals

Sudarsana Kadiri, Farhad Javanmardi, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

3 Sitaatiot (Scopus)
112 Lataukset (Pure)

Abstrakti

Prior studies in the automatic classification of voice quality have mainly studied support vector machine (SVM) classifiers using the acoustic speech signal as input. Recently, one voice quality classification study was published using neck surface accelerometer (NSA) and speech signals as inputs and using SVMs with hand-crafted glottal source features. The present study examines simultaneously recorded NSA and speech signals in the classification of three voice qualities (breathy, modal, and pressed) using convolutional neural networks (CNNs) as classifier. The study has two goals: (1) to investigate which of the two signals (NSA vs. speech) is more useful in the classification task, and (2) to compare whether deep learning -based CNN classifiers with spectrogram and mel-spectrogram features are able to improve the classification accuracy compared to SVM classifiers using hand-crafted glottal source features. The results indicated that the NSA signal showed better classification of the voice qualities compared to the speech signal, and that the CNN classifier outperformed the SVM classifiers with large margins. The best mean classification accuracy was achieved with mel-spectrogram as input to the CNN classifier (93.8% for NSA and 90.6% for speech).
AlkuperäiskieliEnglanti
OtsikkoProceedings of Interspeech'22
KustantajaInternational Speech Communication Association (ISCA)
Sivut5253 - 5257
Sivumäärä5
Vuosikerta2022-September
DOI - pysyväislinkit
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInterspeech - Incheon, Etelä-Korea
Kesto: 18 syysk. 202222 syysk. 2022

Julkaisusarja

NimiAnnual Conference of the International Speech Communication Association
KustantajaInternational Speech Communication Association
ISSN (elektroninen)2958-1796

Conference

ConferenceInterspeech
Maa/AlueEtelä-Korea
KaupunkiIncheon
Ajanjakso18/09/202222/09/2022

Sormenjälki

Sukella tutkimusaiheisiin 'Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä