The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussavertaisarvioitu




In this paper, we presented a neural network system for DCASE 2018 task 2, general purpose audio tagging. We fine-tuned the Google AudioSet feature generation model with different settings for the given 41 classes on top of a fully connected layer with 100 units. Then we used the fine-tuned models to generate 128 dimensional features for each 0.960s audio. We tried different neural network structures including LSTM and multi-level attention models. In our experiments, the multi-level attention model has shown its superiority over others. Truncating the silence parts, repeating and splitting the audio into the fixed length, pitch shifting augmentation, and mixup techniques are all used in our experiments. The proposed system achieved a result with MAP@3 score at 0.936, which outperforms the baseline result of 0.704 and achieves top 8% in the public leaderboard.


OtsikkoProceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)
TilaJulkaistu - marraskuuta 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaDetection and Classification of Acoustic Scenes and Events - Surrey, Iso-Britannia
Kesto: 19 marraskuuta 201820 marraskuuta 2018


WorkshopDetection and Classification of Acoustic Scenes and Events

Lataa tilasto

Ei tietoja saatavilla

ID: 30233108