Multilingual TTS Accent Impressions for Accented ASR

Georgios Karakasidis*, Nathaniel Robinson, Yaroslav Getman, Atieno Ogayo, Ragheb Al-Ghezi, Ananya Ayasi, Shinji Watanabe, David R. Mortensen, Mikko Kurimo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

2 Citations (Scopus)
60 Downloads (Pure)

Abstract

Automatic Speech Recognition (ASR) for high-resource languages like English is often considered a solved problem. However, most high-resource ASR systems favor socioeconomically advantaged dialects. In the case of English, this leaves behind many L2 speakers and speakers of low-resource accents (a majority of English speakers). One way to mitigate this is to fine-tune a pre-trained English ASR model for a desired low-resource accent. However, collecting transcribed accented audio is costly and time-consuming. In this work, we present a method to produce synthetic L2-English speech via pre-trained text-to-speech (TTS) in an L1 language (target accent). This can be produced at a much larger scale and lower cost than authentic speech collection. We present initial experiments applying this augmentation method. Our results suggest that success of TTS augmentation relies on access to more than one hour of authentic training data and a diversity of target-domain prompts for speech synthesis.

Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 26th International Conference, TSD 2023, Proceedings
EditorsKamil Ekštein, František Pártl, Miloslav Konopík
PublisherSpringer
Pages317-327
Number of pages11
ISBN (Print)978-3-031-40497-9
DOIs
Publication statusPublished - 2023
MoE publication typeA4 Conference publication
EventInternational Conference on Text, Speech, and Dialogue - Pilsen, Czech Republic
Duration: 4 Sept 20236 Sept 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14102 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Text, Speech, and Dialogue
Abbreviated titleTSD
Country/TerritoryCzech Republic
CityPilsen
Period04/09/202306/09/2023

Keywords

  • accented speech recognition
  • data augmentation
  • low-resource speech technologies
  • speech synthesis

Fingerprint

Dive into the research topics of 'Multilingual TTS Accent Impressions for Accented ASR'. Together they form a unique fingerprint.

Cite this