Abstrakti
Spatial audio has the potential to revolutionize how we consume music and other audio content by enabling an immersive audio experience. Therefore, the technologyand entertainment industry recently adapted their services and began delivering spatial audio formats. Higher-order Ambisonics (HOA), representing the audio scene in the spherical harmonic domain (SHD), offers various benefits as a spatial audio format, notably the independence of the recording and reproduction setup. However, a critical challenge remains: high-quality spatial audio content is largely inaccessible due to the required number of audio channels and data. Audio codecs can successfully reduce the technical challenges originating from distribution and storage. Despite the demand for high channel-count spatial audio continuing to rise, traditional multichannel codecs fall short of delivering the required performance for HOA. Akin to parametric audio coding, model-based parametric spatial audio techniques can be adapted for perceptual spatial audio coding. Model-based spatial audio techniques may parameterize the input scene in a perceptually meaningful and compact way. The input scene parameterization allows signal-dependent processing such as directional optimizations and informed upmixing, overcoming typical challenges of signal-independent processing. This work proposes a spatial audio codec for HOA using parametric Directional Audio Coding (DirAC). First, a modified spherical harmonic transform strategy is developed that enables analysis, modification, and reconstruction of HOA signals. The following study explores a compression strategy achieving perfect reconstruction of low-order SHD components and parameterized resynthesis of higher-order SHD components, establishing the perceptual effectiveness of this duality. Furthermore, SHD post-processing is derived that leverages the input parameterization to improve the codec output by matching to target signal properties. Finally, this work introduces a HOA audio codec based on the aforementioned theoretical foundations. The experimental results demonstrate significant improvements over traditional multi-channel audio codecs, highlighting the potential of the proposed codec to deliver high-quality spatial audio, advocating for including input parameterization side-information in order to avoid coding excessive channel-counts. The implemented codec achieves excellent perceptual quality ratings while reducing the transport data to only a few percent of the input audio data. In conclusion, this research advances the state of the art in spatial audio coding and yields further development in spatial audio codecs for delivering HOA, making the HOA format and its benefits more accessible, thus enabling wider adoption in various media applications.
Julkaisun otsikon käännös | A Parametric Spatial Audio Compression Codec for Higher-Order Ambisonics |
---|---|
Alkuperäiskieli | Englanti |
Pätevyys | Tohtorintutkinto |
Myöntävä instituutio |
|
Valvoja/neuvonantaja |
|
Kustantaja | |
Painoksen ISBN | 978-952-64-2078-3 |
Sähköinen ISBN | 978-952-64-2079-0 |
Tila | Julkaistu - 2024 |
OKM-julkaisutyyppi | G5 Artikkeliväitöskirja |