Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers

Farhad Javanmardi*, Sudarsana Kadiri, Manila Kodali, Paavo Alku

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)
79 Downloads (Pure)

Abstract

The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared to using the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.
Original languageEnglish
Title of host publicationINTERSPEECH 2022
PublisherInternational Speech Communication Association (ISCA)
Pages2173 - 2177
Number of pages5
Volume2022-September
DOIs
Publication statusPublished - Sept 2022
MoE publication typeA4 Article in a conference publication
EventInterspeech - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022

Publication series

NameInterspeech
PublisherInternational Speech Communication Association
ISSN (Print)1990-9772
ISSN (Electronic)2308-457X

Conference

ConferenceInterspeech
Country/TerritoryKorea, Republic of
CityIncheon
Period18/09/202222/09/2022

Fingerprint

Dive into the research topics of 'Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers'. Together they form a unique fingerprint.

Cite this