Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment

Einari Vaaras*, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Liisa Lehtonen, Okko Räsänen

*Tämän työn vastaava kirjoittaja

    Tutkimustuotos: LehtiartikkeliArticleScientificvertaisarvioitu

    1 Sitaatiot (Scopus)
    27 Lataukset (Pure)

    Abstrakti

    In order to study how early emotional experiences shape infant development, one approach is to analyze the emotional content of speech heard by infants, as captured by child-centered daylong recordings, and as analyzed by automatic speech emotion recognition (SER) systems. However, since large-scale daylong audio is initially unannotated and differs from typical speech corpora from controlled environments, there are no existing in-domain SER systems for the task. Based on existing literature, it is also unclear what is the best approach to deploy a SER system for a new domain. Consequently, in this study, we investigated alternative strategies for deploying a SER system for large-scale child-centered audio recordings from a neonatal hospital environment, comparing cross-corpus generalization, active learning (AL), and domain adaptation (DA) methods in the process. We first conducted simulations with existing emotion-labeled speech corpora to find the best strategy for SER system deployment. We then tested how the findings generalize to our new initially unannotated dataset. As a result, we found that the studied AL method provided overall the most consistent results, being less dependent on the specifics of the training corpora or speech features compared to the alternative methods. However, in situations without the possibility to annotate data, unsupervised DA proved to be the best approach. We also observed that deployment of a SER system for real-world daylong child-centered audio recordings achieved a SER performance level comparable to those reported in literature, and that the amount of human effort required for the system deployment was overall relatively modest.

    AlkuperäiskieliEnglanti
    Sivut9-22
    Sivumäärä14
    JulkaisuSpeech Communication
    Vuosikerta148
    Varhainen verkossa julkaisun päivämäärä13 helmik. 2023
    DOI - pysyväislinkit
    TilaJulkaistu - maalisk. 2023
    OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

    Sormenjälki

    Sukella tutkimusaiheisiin 'Development of a speech emotion recognizer for large-scale child-centered audio recordings from a hospital environment'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

    Siteeraa tätä