Projekteja vuodessa
Abstrakti
Speech embeddings, fixed-size representations derived from raw audio data, play a crucial role in diverse machine learning applications. Despite the abundance of speech embedding techniques, selecting the most suitable one remains challenging. Existing studies often focus on intrinsic or extrinsic aspects, seldom exploring both simultaneously. Furthermore, comparing the state-of-the-art pre-trained models with prior speech embedding solutions is notably scarce in the literature. To address these gaps, we undertake a comprehensive evaluation of both small and large-scale speech embedding models, which, in our opinion, needs to incorporate both intrinsic and extrinsic assessments. The intrinsic experiments delve into the models' ability to pick speaker-related characteristics and assess their discriminative capacities, providing insights into their inherent capabilities and internal workings. Concurrently, the extrinsic experiments evaluate whether the models learned semantic cues during pre-training. The findings underscore the superior performance of the large-scale pre-trained models, albeit at an elevated computational cost. The base self-supervised models show comparable results to their large counterparts, making them a better choice for many applications. Furthermore, we show that by selecting the most crucial dimensions, the models' performance often does not suffer drastically and even improves in some cases. This research contributes valuable insights into the nuanced landscape of speech embeddings, aiding researchers and practitioners in making informed choices for various applications.
Alkuperäiskieli | Englanti |
---|---|
Sivut | 3546-3560 |
Sivumäärä | 15 |
Julkaisu | IEEE/ACM Transactions on Audio Speech and Language Processing |
Vuosikerta | 32 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2024 |
OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Sormenjälki
Sukella tutkimusaiheisiin 'From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Projektit
- 1 Aktiivinen
-
LAREINA: LAREINA - Language Resource Infrastructure for AI
Kurimo, M. (Vastuullinen tutkija), Moisio, A. (Projektin jäsen), Getman, Y. (Projektin jäsen), Porjazovski, D. (Projektin jäsen), Rouhe, A. (Projektin jäsen) & Virkkunen, A. (Projektin jäsen)
01/01/2023 → 31/12/2025
Projekti: Business Finland: Strategic centres for science, technology and innovation (SHOK)