Speech recordings during an Magnetic Resonance Imaging (MRI) experiment yield valuable but noisy acoustic data for modelling purposes. Despite recent improvements in optical microphones for MRI, using an acoustic sound recording system in dipole configuration, based on shielded electret microphones and waveguides, has some inherent advantages: For example, the bandwidth can be made very wide, and the extremely linear behaviour of all components facilitates the numerical post-processing of signals. Here, the full measurement chain and the critical design decisions are reviewed. In particular, one must take into account the resonant behaviour of the necessarily non-optimally terminated waveguides as well as the environment acoustics within the MRI coils. Two competing approaches for the signal post-processing were developed: (i) Optimal subtraction of the noise and speech signals, and (ii) spectral peak removal by adaptively fitted comb filters. The main objective is the high-quality spectral identification, estimation, and classification of source features, e.g., vowel formants. Approach (i) preserves the phase behaviour of the original signal. Approach (ii) produces excellent speech sound quality when the noise consists of only few harmonic sources; a salient property of MRI acoustic noise. Both of the approaches must compensate the frequency dependent damping of the measurement chain.