Automatic classification of vocal intensity category from speech

Manila Kodali, Sudarsana Kadiri, Laura Laaksonen, Paavo Alku

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

10 Downloads (Pure)

Abstract

Regulation of vocal intensity is a fundamental phenomenon in speech communication. Vocal intensity can be quantified using sound pressure level (SPL), which can be measured easily by recording a standard calibration signal with speech and by comparing the energy of the recorded speech signal with that of the calibration tone. Unfortunately, speech recordings are mostly conducted without the SPL calibration signal, and speech signals are saved to databases using arbitrary amplitude scales. Therefore, neither the SPL nor the intensity category (e.g. soft or loud phonation) of a saved speech signal can be determined afterwards. Even though the original level information of speech is lost when the signal is presented on arbitrary amplitude scales, the speech signal contains other acoustic cues of vocal intensity. In the current study, we study machine learning and deep learning -based methods in automatic classification of vocal intensity category when the input speech is expressed using an arbitrary amplitude scale. A new gender-balanced database consisting of speech produced in four vocal intensity categories (soft, normal, loud, and very loud) was first recorded. Support vector machine and deep neural network (DNN) models were used to develop automatic classification systems using spectrograms, mel-spectrograms, and mel-frequency cepstral coefficients as features. The DNN classifier using the mel-spectrogram showed the best classification accuracy of about 90%. The database is made publicly available at https://bit.ly/3tLPGRx
Original languageEnglish
Title of host publicationProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’23)
PublisherIEEE
Number of pages5
ISBN (Electronic)978-1-7281-6327-7
DOIs
Publication statusPublished - 2023
MoE publication typeA4 Conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

Name Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
Country/TerritoryGreece
CityRhodes Island
Period04/06/202310/06/2023

Fingerprint

Dive into the research topics of 'Automatic classification of vocal intensity category from speech'. Together they form a unique fingerprint.

Cite this