Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)
65 Downloads (Pure)

Abstract

Prior studies in the automatic classification of voice quality have mainly studied support vector machine (SVM) classifiers using the acoustic speech signal as input. Recently, one voice quality classification study was published using neck surface accelerometer (NSA) and speech signals as inputs and using SVMs with hand-crafted glottal source features. The present study examines simultaneously recorded NSA and speech signals in the classification of three voice qualities (breathy, modal, and pressed) using convolutional neural networks (CNNs) as classifier. The study has two goals: (1) to investigate which of the two signals (NSA vs. speech) is more useful in the classification task, and (2) to compare whether deep learning -based CNN classifiers with spectrogram and mel-spectrogram features are able to improve the classification accuracy compared to SVM classifiers using hand-crafted glottal source features. The results indicated that the NSA signal showed better classification of the voice qualities compared to the speech signal, and that the CNN classifier outperformed the SVM classifiers with large margins. The best mean classification accuracy was achieved with mel-spectrogram as input to the CNN classifier (93.8% for NSA and 90.6% for speech).
Original languageEnglish
Title of host publicationProceedings of Interspeech'22
PublisherInternational Speech Communication Association (ISCA)
Pages5253 - 5257
Number of pages5
Volume2022-September
DOIs
Publication statusPublished - 2022
MoE publication typeA4 Article in a conference publication
EventInterspeech - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022

Publication series

NameAnnual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
ISSN (Print)1990-9772
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech
Country/TerritoryKorea, Republic of
CityIncheon
Period18/09/202222/09/2022

Keywords

  • Voice quality
  • neck surface accelerometer
  • Melspectrogram
  • computational paralinguistics
  • CNNs

Fingerprint

Dive into the research topics of 'Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals'. Together they form a unique fingerprint.

Cite this