Classification of vocal intensity category from speech using the wav2vec2 and whisper embeddings

Manila Kodali, Sudarsana Kadiri, Paavo Alku

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

6 Sitaatiot (Scopus)
130 Lataukset (Pure)

Abstrakti

In speech communication, talkers regulate vocal intensity resulting in speech signals of different intensity categories (e.g., soft, loud). Intensity category carries important information about the speaker's health and emotions. However, many speech databases lack calibration information, and therefore sound pressure level cannot be measured from the recorded data. Machine learning, however, can be used in intensity category classification even though calibration information is not available. This study investigates pre-trained model embeddings (Wav2vec2 and Whisper) in classification of vocal intensity category (soft, normal, loud, and very loud) from speech signals expressed using arbitrary amplitude scales. We use a new database consisting of two speaking tasks (sentence and paragraph). Support vector machine is used as a classifier. Our results show that the pre-trained model embeddings outperformed three baseline features, providing improvements of up to 7%(absolute) in accuracy.

AlkuperäiskieliEnglanti
OtsikkoProceedings of Interspeech'23
KustantajaInternational Speech Communication Association (ISCA)
Sivut4134-4138
Sivumäärä5
Vuosikerta2023-August
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInterspeech - Dublin, Irlanti
Kesto: 20 elok. 202324 elok. 2023

Julkaisusarja

NimiInterspeech
KustantajaInternational Speech Communication Association
ISSN (painettu)1990-9772
ISSN (elektroninen)2308-457X

Conference

ConferenceInterspeech
Maa/AlueIrlanti
KaupunkiDublin
Ajanjakso20/08/202324/08/2023

Sormenjälki

Sukella tutkimusaiheisiin 'Classification of vocal intensity category from speech using the wav2vec2 and whisper embeddings'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä