Projekteja vuodessa
Abstrakti
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most existing methods produce plausible reconstructions when the gap lengths are short but struggle to reconstruct gaps larger than about 100 ms. This paper explores diffusion models, a recent class of deep learning models, for the task of audio inpainting. The proposed method uses an unconditionally trained generative model, which can be conditioned in a zero-shot fashion for audio inpainting and is able to regenerate gaps of any size. An improved deep neural network architecture based on the constant-Q transform that allows the model to exploit pitch-equivariant symmetries in audio is also presented. The performance of the proposed algorithm is evaluated through objective and subjective metrics for the task of reconstructing short to mid-sized gaps, up to 300 ms. The results of a formal listening test indicate that, for short gaps in the range of 50 ms, the proposed method delivers performance comparable to the baselines. For wider gaps up to 300 ms long, the authors’ method outperforms the baselines and retains good or fair audio quality. The method presented in this paper can be applied to restoring sound recordings that suffer from severe local disturbances or dropouts.
Alkuperäiskieli | Englanti |
---|---|
Sivut | 100-113 |
Sivumäärä | 14 |
Julkaisu | AES: Journal of the Audio Engineering Society |
Vuosikerta | 72 |
Numero | 3 |
DOI - pysyväislinkit | |
Tila | Julkaistu - maalisk. 2024 |
OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Sormenjälki
Sukella tutkimusaiheisiin 'Diffusion-Based Audio Inpainting'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Projektit
- 1 Päättynyt
-
NordicSMC: Nordic Sound and Music Computing Network
Välimäki, V. (Vastuullinen tutkija), Louise, B. (Projektin jäsen), Fagerström, J. (Projektin jäsen) & Prawda, K. (Projektin jäsen)
01/01/2018 → 31/12/2023
Projekti: Other external funding: Other foreign funding