Abstract
Accurately characterizing a user’s acoustic environment is essential for creating virtual sound sources in augmented reality that blend seamlessly into the real environment. The acoustic parameters of an environment can be calculated from a
room impulse response (RIR) and the authors recently presented a method to blindly estimate RIRs from speech signals captured with a head-worn microphone array. The approach uses either speech from a distant speaker or own speech from the person
wearing the array on their head. While both variants provide reliable reverberation time estimates, direct-to-reverberant energy ratio (DRR) estimates from the user’s own speech deviate significantly from the expected DRR of a distant virtual source
due to the higher direct sound level. This study investigates the feasibility of extrapolating DRR values from own speech to predict DRRs of distant sources. The approach relies on two acoustic assumptions: (i), the mouth-to-array transfer paths do not change significantly between users and, (ii), a homogeneous reverberant field. Our findings show that the assumptions hold above the Schr¨oder frequency and in sufficiently reverberant conditions. Average DRR extrapolation errors are below 2 dB at mid frequencies when using mouth simulator measurements and around 3 dB with actual speech recordings.
room impulse response (RIR) and the authors recently presented a method to blindly estimate RIRs from speech signals captured with a head-worn microphone array. The approach uses either speech from a distant speaker or own speech from the person
wearing the array on their head. While both variants provide reliable reverberation time estimates, direct-to-reverberant energy ratio (DRR) estimates from the user’s own speech deviate significantly from the expected DRR of a distant virtual source
due to the higher direct sound level. This study investigates the feasibility of extrapolating DRR values from own speech to predict DRRs of distant sources. The approach relies on two acoustic assumptions: (i), the mouth-to-array transfer paths do not change significantly between users and, (ii), a homogeneous reverberant field. Our findings show that the assumptions hold above the Schr¨oder frequency and in sufficiently reverberant conditions. Average DRR extrapolation errors are below 2 dB at mid frequencies when using mouth simulator measurements and around 3 dB with actual speech recordings.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the European Signal Processing Conference (EUSIPCO) |
| Publisher | European Association For Signal and Image Processing |
| Number of pages | 5 |
| ISBN (Electronic) | 978-94-645936-2-4 |
| DOIs | |
| Publication status | Published - 2025 |
| MoE publication type | A4 Conference publication |
| Event | European Signal Processing Conference - Palermo, Italy Duration: 8 Sept 2025 → 12 Sept 2025 Conference number: 33 |
Conference
| Conference | European Signal Processing Conference |
|---|---|
| Abbreviated title | EUSIPCO |
| Country/Territory | Italy |
| City | Palermo |
| Period | 08/09/2025 → 12/09/2025 |
Fingerprint
Dive into the research topics of 'Direct-to-Reverberant Energy Ratio Estimation and Extrapolation from Own Speech'. Together they form a unique fingerprint.Equipment
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver