Post-processing speech recordings during MRI

Research output: Scientific - peer-reviewArticle


Original languageEnglish
Pages (from-to)11-22
Number of pages12
StatePublished - 2018
MoE publication typeA1 Journal article-refereed


Research units


We discuss post-processing of speech samples that have been recorded simultaneously during Magnetic Resonance Imaging (MRI) of the upper airways. These speech recordings contain high levels of acoustic noise from the MRI scanner. The required noise reduction is based on adaptively fitted comb filters, and it has been designed keeping in mind the special requirements of the subsequent vowel formant extraction. Furthermore, the frequency response of the sound signal path is not flat because of severe restrictions on recording instrumentation and arrangements imposed by the MRI technology.

Two kinds of speech materials were used to validate the post-processing algorithm. The primary material consists of samples of prolonged vowel productions during MRI. The comparison data was obtained from the same test subject, and it was recorded in anechoic chamber in a configuration that resembles the setting used in the MRI experiments but excluding the surrounding structures of the MRI scanner. Spectral envelopes and vowel formants were computed from the post-processed speech as well as from the comparison data. Pure vowel samples (with a known formant structure) were artificially contaminated using MRI scanner noise in order to determine performance of the post-processing algorithm where using true data from the MRI experiments would be difficult. Resonances computed from a numerical acoustic model (based on the Helmholtz equation) as well as spectra measured from 3D printed vocal tract physical models were used as comparison data as well.

It was observed that the properties of the recording instrumentation or the post-processing algorithm do not explain the observed frequency dependent discrepancy between the vowel formant data from experiments during MRI and comparable data recorded in the anechoic chamber. It is shown that the discrepancy is statistically significant, in particular, where it is largest at 1 kHz and 2 kHz. Numerical and experimental evidence suggests that the surfaces of the MRI head coil change the acoustics of speech which results in “exterior formants” at these frequencies. The change is so large that it cannot be neglected if the sound recordings during MRI experiments are to be used for parameter estimation or validation of a numerical speech model, based on the MRI geometries of the vocal tract. However, the role of test subject adaptation to noise and constrained space acoustics during an MRI examination cannot be ruled out.

    Research areas

  • speech, MRI, noise reduction, DSP, Helmholtz equation

ID: 9616481