CURRICULUM LEARNING WITH AUDIO DOMAIN DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION

Research output: Contribution to conferencePaperScientific

Abstract

In this report we explore a variety of data augmentation techniques in audio domain, along with a curriculum learning approach, for sound event localizaiton and detection (SELD) tasks. We focus our work on two areas: 1) techniques that modify timbral of temporal characteristics of all channels simultaneously, such as equalization or added noise; 2) methods that transform the spatial impression of the full sound scene, such as directional loudness modifications. We test the approach on models using either time-frequency or raw audio features, trained and evaluated on the STARSS22: Sony-TAU Realistic Spatial Soundscapes 2022 dataset. Although the proposed system struggles to beat the official benchmark system, the aug- mentation techniques show improvements over our non-augmented baseline.
Original languageEnglish
Publication statusPublished - 15 Jul 2022
MoE publication typeNot Eligible

Keywords

  • sound event localization and detection
  • sound event localization
  • sound event detection

Fingerprint

Dive into the research topics of 'CURRICULUM LEARNING WITH AUDIO DOMAIN DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION'. Together they form a unique fingerprint.

Cite this