Skip to main navigation Skip to search Skip to main content

Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit

  • Einari Vaaras
  • , Sari Ahlqvist-Bj¨orkroth
  • , Konstantinos Drossos
  • , Okko R&Die;as¨anen

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    1 Citation (Scopus)
    90 Downloads (Pure)

    Abstract

    Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.

    Original languageEnglish
    Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
    PublisherInternational Speech Communication Association (ISCA)
    Pages526-530
    Number of pages5
    ISBN (Electronic)9781713836902
    DOIs
    Publication statusPublished - 2021
    MoE publication typeA4 Conference publication
    EventInterspeech - Brno, Czech Republic
    Duration: 30 Aug 20213 Sept 2021
    Conference number: 22

    Publication series

    NameProceedings of the Annual Conference of the International Speech Communication Association
    ISSN (Print)2308-457X
    ISSN (Electronic)1990-9772

    Conference

    ConferenceInterspeech
    Abbreviated titleINTERSPEECH
    Country/TerritoryCzech Republic
    CityBrno
    Period30/08/202103/09/2021

    Keywords

    • Daylong audio
    • Lena recorder
    • Real-world audio
    • Speech analysis
    • Speech emotion recognition

    Fingerprint

    Dive into the research topics of 'Automatic analysis of the emotional content of speech in daylong child-centered recordings from a neonatal intensive care unit'. Together they form a unique fingerprint.

    Cite this