Techniques for versatile spatial-audio reproduction in time-frequency domain

Mikko-Ville Laitinen

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

We can perceive many spatial aspects about the sounds around us. These include the direction, the distance, and the size of the sound source, as well as properties about the space inside which we are. Thus, reproduction of sound should take these spatial properties into account if natural perception of a sound scene is desired. Directional audio coding (DirAC) is a recently proposed method for spatial sound reproduction. It operates in the time-frequency domain and aims to analyze the perceptually significant properties of the sound field. The analyzed parameters, namely the direction of arrival and the diffuseness, are used for manipulating recorded microphone signals in such a way that the perception of the reproduced sound field is equal to the original sound field. Subjective evaluations have shown that, compared to traditional methods, DirAC improves the perceived quality. However, DirAC was originally introduced for relatively limited use cases. This thesis presents methods to generalize the DirAC approach for more versatile use. The generalization is performed for three aspects: challenging spatial-sound scenarios, output systems, and input systems. As DirAC is a parametric method, the resulting quality is signal dependent. Thus, challenging sound scenarios for DirAC processing were sought in order to improve the processing and to enable good quality with all kinds of signals. A few problematic cases were found, e.g., multiple simultaneous talkers in low-echoic conditions and applause-type signals. This thesis shows that the decorrelation processing used in DirAC increases the perceived spaciousness with certain signals. Alternative methods for these problematic cases are introduced showing improvement in the perceived quality based on subjective evaluation. DirAC originally used loudspeakers for reproduction. As an addition to possible reproduction devices, a method for headphone reproduction is presented in this thesis. The method is based on binaural techniques and head tracking, and subjective evaluations show that natural spatial impression can be reproduced. DirAC was originally developed to be used with B-format microphones, but in practice they are rarely used for recording. A method for more common spaced-microphone arrays, which is additionally shown to have some advantages compared to the B-format processing, is presented in this thesis. Furthermore, DirAC is extended to be used with legacy multi-channel signals, such as 5.1 surround, and even further to virtual-world spatial audio. Finally, a modular structure for DirAC processing is introduced. The structure allows several types of inputs to be used simultaneously without compromising the quality of reproduction.
Translated title of the contributionTekniikoita monipuoliseen tilaäänen toistamiseen aika-taajuusalueessa
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Pulkki, Ville, Supervising Professor
  • Pulkki, Ville, Thesis Advisor
Publisher
Print ISBNs978-952-60-5528-2
Electronic ISBNs978-952-60-5529-9
Publication statusPublished - 2014
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • spatial audio
  • multi-channel reproduction

Fingerprint Dive into the research topics of 'Techniques for versatile spatial-audio reproduction in time-frequency domain'. Together they form a unique fingerprint.

Cite this