Wavelet scattering network features for intensity category classification and prediction of SPL from speech

Manila Kodali, Sudarsana Kadiri, Shrikanth Narayanan, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

2 Lataukset (Pure)

Abstrakti

Speakers change vocal intensity in daily life to communicate over long distances and to express vocal emotions. Humans produce speech using different intensity categories (e.g. soft, normal and loud voice) and they can regulate intensity across a wide sound pressure level (SPL) range. Knowing the intensity category or the SPL of speech is beneficial in speech-based biomarking of health. Recent studies have explored the vocal intensity category classification and prediction of SPL from speech, which has been recorded without SPL calibration information and is presented on an arbitrary amplitude scale. Using speech signals in such scenario, this study investigates the wavelet scattering network (WSN) features in two tasks: (1) classification of speech into four intensity categories (soft, normal, loud, very loud) (multi-class classification task) and (2) prediction of SPL (regression task). In the former task, the WSN features showed absolute accuracy improvements of 4-14% compared to reference features. For the latter task, the WSN features improved the prediction of SPL by an average of 1-2 dB compared to the reference features.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’25)
ToimittajatBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
KustantajaIEEE
Sivumäärä5
ISBN (elektroninen)979-8-3503-6874-1
DOI - pysyväislinkit
TilaJulkaistu - 2025
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE International Conference on Acoustics, Speech, and Signal Processing - Hyderabad, Intia
Kesto: 6 huhtik. 202511 huhtik. 2025

Julkaisusarja

Nimi Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (elektroninen)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
LyhennettäICASSP
Maa/AlueIntia
KaupunkiHyderabad
Ajanjakso06/04/202511/04/2025

Sormenjälki

Sukella tutkimusaiheisiin 'Wavelet scattering network features for intensity category classification and prediction of SPL from speech'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä