Experiments on adaptation methods to improve acoustic modeling for French speech recognition

Saeideh Mirzaei, Pierrick Milhorat, Jérôme Boudy, Gérard Chollet, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

To improve the performance of Automatic Speech Recognition (ASR) systems, the models must be retrained in order to better adjust to the speaker's voice characteristics, the environmental and channel conditions or the context of the task. In this project we focus on the mismatch between the acoustic features used to train the model and the vocal characteristics of the front-end user of the system. To overcome this mismatch, speaker adaptation techniques have been used. A significant performance improvement has been shown using using constrained Maximum Likelihood Linear Regression (cMLLR) model adaptation methods, while a fast adaptation is guaranteed by using linear Vocal Tract Length Normalization (lVTLN).We have achieved a relative gain of approximately 9.44% in the word error rate with unsupervised cMLLR adaptation. We also compare our ASR system with the Google ASR and show that, using adaptation methods, we exceed its performance. Copyright

Original languageEnglish
Title of host publicationICPRAM 2016 - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods
PublisherSciTePress
Pages278-282
Number of pages5
ISBN (Print)9789897581731
Publication statusPublished - 2016
MoE publication typeA4 Conference publication
EventInternational Conference on Pattern Recognition Applications and Methods - Rome, Italy
Duration: 24 Feb 201626 Feb 2016
Conference number: 5

Conference

ConferenceInternational Conference on Pattern Recognition Applications and Methods
Abbreviated titleICPRAM
Country/TerritoryItaly
CityRome
Period24/02/201626/02/2016

Keywords

  • Linear regression
  • Speaker adaptation
  • Speech recognition
  • Vocal tract

Fingerprint

Dive into the research topics of 'Experiments on adaptation methods to improve acoustic modeling for French speech recognition'. Together they form a unique fingerprint.

Cite this