Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations

Annamaria Mesaros, Toni Heittola, Onur Dikmen, Tuomas Virtanen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

57 Citations (Scopus)

Abstract

Methods for detection of overlapping sound events in audio involve matrix factorization approaches, often assigning separated components to event classes. We present a method that bypasses the supervised construction of class models. The method learns the components as a non-negative dictionary in a coupled matrix factorization problem, where the spectral representation and the class activity annotation of the audio signal share the activation matrix. In testing, the dictionaries are used to estimate directly the class activations. For dealing with large amount of training data, two methods are proposed for reducing the size of the dictionary. The methods were tested on a database of real life recordings, and outperformed previous approaches by over 10%.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PublisherIEEE
Pages151-155
Number of pages5
Volume2015-August
ISBN (Electronic)9781467369978
DOIs
Publication statusPublished - 4 Aug 2015
MoE publication typeA4 Article in a conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Brisbane, Australia
Duration: 19 Apr 201524 Apr 2015
Conference number: 40

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
CountryAustralia
CityBrisbane
Period19/04/201524/04/2015

Keywords

  • coupled non-negative matrix factorization
  • non-negative dictionaries
  • sound event detection

Fingerprint

Dive into the research topics of 'Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations'. Together they form a unique fingerprint.

Cite this