Projects per year
Abstract
With the rapid advancement in automatic speech recognition and
natural language understanding, a complementary field (paralin-
guistics) emerged, focusing on the non-verbal content of speech.
The ACM Multimedia 2022 Computational Paralinguistics Challenge introduced several exciting tasks of this field. In this work, we
focus on tackling two Sub-Challenges using modern, pre-trained
models called wav2vec2. Our experimental results demonstrated
that wav2vec2 is an excellent tool for detecting the emotions behind vocalisations and recognising different types of stutterings.
Albeit they achieve outstanding results on their own, our results
demonstrated that wav2vec2-based systems could be further improved by ensembling them with other models. Our best systems
outperformed the competition baselines by a considerable margin,
achieving an unweighted average recall of 44.0 (absolute improvement of 6.6% over baseline) on the Vocalisation Sub-Challenge and
62.1 (absolute improvement of 21.7% over baseline) on the Stuttering
Sub-Challenge.
natural language understanding, a complementary field (paralin-
guistics) emerged, focusing on the non-verbal content of speech.
The ACM Multimedia 2022 Computational Paralinguistics Challenge introduced several exciting tasks of this field. In this work, we
focus on tackling two Sub-Challenges using modern, pre-trained
models called wav2vec2. Our experimental results demonstrated
that wav2vec2 is an excellent tool for detecting the emotions behind vocalisations and recognising different types of stutterings.
Albeit they achieve outstanding results on their own, our results
demonstrated that wav2vec2-based systems could be further improved by ensembling them with other models. Our best systems
outperformed the competition baselines by a considerable margin,
achieving an unweighted average recall of 44.0 (absolute improvement of 6.6% over baseline) on the Vocalisation Sub-Challenge and
62.1 (absolute improvement of 21.7% over baseline) on the Stuttering
Sub-Challenge.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th ACM International Conference on Multimedia |
Publisher | ACM |
Number of pages | 4 |
ISBN (Electronic) | 978-1-4503-9203-7 |
DOIs | |
Publication status | Published - Oct 2022 |
MoE publication type | A4 Conference publication |
Event | ACM International Conference on Multimedia - Lisboa, Portugal Duration: 10 Oct 2022 → 14 Oct 2022 Conference number: 30 |
Conference
Conference | ACM International Conference on Multimedia |
---|---|
Abbreviated title | MM |
Country/Territory | Portugal |
City | Lisboa |
Period | 10/10/2022 → 14/10/2022 |
Fingerprint
Dive into the research topics of 'Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering'. Together they form a unique fingerprint.Projects
- 2 Active
-
USSEE: Understanding Speech and Scene with Ears and Eyes
Kurimo, M. (Principal investigator), Virkkunen, A. (Project Member) & Grósz, T. (Project Member)
01/01/2022 → 31/12/2024
Project: Academy of Finland: Other research funding
-
TEFLON: Technology-enhanced foreign and second-language learning of Nordic languages
Kurimo, M. (Principal investigator), Al-Ghezi, R. (Project Member), Smolander, A.-R. (Project Member), Getman, Y. (Project Member), Phan, N. (Project Member), Grósz, T. (Project Member) & Voskoboinik, E. (Project Member)
01/04/2021 → 31/03/2025
Project: Other external funding: Other foreign funding