Abstract

This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with solo piano music, using objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problems in audio processing.
Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages1-5
Number of pages5
ISBN (Electronic)978-1-7281-6327-7
ISBN (Print)978-1-7281-6328-4
DOIs
Publication statusPublished - 10 Jun 2023
MoE publication typeA4 Conference publication
EventIEEE International Conference on Acoustics, Speech, and Signal Processing - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP
Country/TerritoryGreece
CityRhodes Island
Period04/06/202310/06/2023

Keywords

  • Adaptation models
  • Time-frequency analysis
  • Inverse problems
  • Acoustic noise
  • Bandwidth
  • Transforms
  • Signal processing

Fingerprint

Dive into the research topics of 'Solving Audio Inverse Problems with a Diffusion Model'. Together they form a unique fingerprint.

Cite this