Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)
38 Downloads (Pure)

Abstract

Large pre-trained models are essential in paralinguistic systems, demonstrating effectiveness in tasks like emotion recognition and stuttering detection. In this paper, we employ large pre-trained models for the ACM Multimedia Computational Paralinguistics Challenge, addressing the Requests and Emotion Share tasks. We explore audio-only and hybrid solutions leveraging audio and text modalities. Our empirical results consistently show the superiority of the hybrid approaches over the audio-only models. Moreover, we introduce a Bayesian layer as an alternative to the standard linear output layer. The multimodal fusion approach achieves an 85.4% UAR on HC-Requests and 60.2% on HC-Complaints. The ensemble model for the Emotion Share task yields the best 𝜌 value of .614. The Bayesian wav2vec2 approach, explored in this study, allows us to easily build ensembles, at the cost of fine-tuning only one model. Moreover, we can have usable confidence values instead of the usual overconfident posterior probabilities.
Original languageEnglish
Title of host publicationMM '23: Proceedings of the 31st ACM International Conference on Multimedia
PublisherACM
Pages9477-9481
Number of pages5
ISBN (Electronic)979-8-4007-0108-5
DOIs
Publication statusPublished - 27 Oct 2023
MoE publication typeA4 Conference publication
EventACM International Conference on Multimedia - Ottawa, Canada
Duration: 29 Oct 202329 Oct 2023
Conference number: 31

Conference

ConferenceACM International Conference on Multimedia
Abbreviated titleMM
Country/TerritoryCanada
CityOttawa
Period29/10/202329/10/2023

Fingerprint

Dive into the research topics of 'Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference'. Together they form a unique fingerprint.

Cite this