CURRICULUM LEARNING WITH AUDIO DOMAIN DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaKonferenssiesitysScientific

Abstrakti

In this report we explore a variety of data augmentation techniques in audio domain, along with a curriculum learning approach, for sound event localizaiton and detection (SELD) tasks. We focus our work on two areas: 1) techniques that modify timbral of temporal characteristics of all channels simultaneously, such as equalization or added noise; 2) methods that transform the spatial impression of the full sound scene, such as directional loudness modifications. We test the approach on models using either time-frequency or raw audio features, trained and evaluated on the STARSS22: Sony-TAU Realistic Spatial Soundscapes 2022 dataset. Although the proposed system struggles to beat the official benchmark system, the aug- mentation techniques show improvements over our non-augmented baseline.
AlkuperäiskieliEnglanti
TilaJulkaistu - 15 heinäk. 2022
OKM-julkaisutyyppiEi oikeutettu

Sormenjälki

Sukella tutkimusaiheisiin 'CURRICULUM LEARNING WITH AUDIO DOMAIN DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä