Perceptually-motivated spatial audio codec for higher-order Ambisonics compression - Examples

Dataset

Description

Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide-range of (potentially unknown) devices. The number of channels required to deliver high spatial resolution Ambisonic audio, however, can be prohibitive for low-bandwidth applications. Therefore, in this paper, a compression codec is proposed, which is based upon the higher-order Directional Audio Coding (HO-DirAC) model. The encoder downmixes the higher-order Ambisonics (HOA) input audio into a reduced number of signals, which are accompanied by spatial parameterization metadata. The downmixed audio is coded using a perceptual audio coder, whereas the metadata is grouped into perceptual bands, quantised, and downsampled. On the decoder side, low Ambisonic orders are fully recovered. Whereas, not fully recoverable high Ambisonic orders are synthesized based on the spatial metadata. The results of a listening test indicate that the proposed parametric spatial audio codec can improve the adopted perceptual coder, especially at low to medium-high bitrates, when applied to fifth-order HOA signals.
Date made available11 Sept 2023
PublisherZenodo

Dataset Licences

  • CC-BY-4.0

Cite this