Comparing human and automatic speech recognition in a perceptual restoration experiment

Ulpu Remes*, Ana Ramírez López, Lauri Juvela, Kalle Palomäki, Guy J. Brown, Paavo Alku, Mikko Kurimo

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review


Speech that has been distorted by introducing spectral or temporal gaps is still perceived as continuous and complete by human listeners, so long as the gaps are filled with additive noise of sufficient intensity. When such perceptual restoration occurs, the speech is also more intelligible compared to the case in which noise has not been added in the gaps. This observation has motivated so-called 'missing data' systems for automatic speech recognition (ASR), but there have been few attempts to determine whether such systems are a good model of perceptual restoration in human listeners. Accordingly, the current paper evaluates missing data ASR in a perceptual restoration task. We evaluated two systems that use a new approach to bounded marginalisation in the cepstral domain, and a bounded conditional mean imputation method. Both methods model available speech information as a clean-speech posterior distribution that is subsequently passed to an ASR system. The proposed missing data ASR systems were evaluated using distorted speech, in which spectro-temporal gaps were optionally filled with additive noise. Speech recognition performance of the proposed systems was compared against a baseline ASR system, and with human speech recognition performance on the same task. We conclude that missing data methods improve speech recognition performance in a manner that is consistent with perceptual restoration in human listeners.

Original languageEnglish
Pages (from-to)14-31
Number of pages18
JournalComputer Speech and Language
Publication statusPublished - 11 Jul 2016
MoE publication typeA1 Journal article-refereed


  • Automatic speech recognition
  • Missing data
  • Observation uncertainties
  • Perceptual restoration
  • Uncertainty propagation


Dive into the research topics of 'Comparing human and automatic speech recognition in a perceptual restoration experiment'. Together they form a unique fingerprint.

Cite this