Skip to main navigation Skip to search Skip to main content

Self-supervised end-to-end ASR for low resource L2 Swedish

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

9 Citations (Scopus)
149 Downloads (Pure)

Abstract

Unlike traditional (hybrid) Automatic Speech Recognition (ASR), end-to-end ASR systems simplify the training procedure by directly mapping acoustic features to sequences of graphemes or characters, thereby eliminating the need for specialized acoustic, language, or pronunciation models. However, one drawback of end-to-end ASR systems is that they require more training data than conventional ASR systems to achieve similar word error rate (WER). This makes it difficult to develop ASR systems for tasks where transcribed target data is limited such as developing ASR for Second Language (L2) speakers of Swedish. Nonetheless, recent advancements in selfsupervised acoustic learning, manifested in wav2vec models [1, 2, 3], leverage the available untranscribed speech data to provide compact acoustic representation that can achieve low WER when incorporated in end-to-end systems. To this end, we experiment with several monolingual and cross-lingual selfsupervised acoustic models to develop end-to-end ASR system for L2 Swedish. Even though our test is very small, it indicates that these systems are competitive in performance with traditional ASR pipeline. Our best model seems to reduce the WER by 7% relative to our traditional ASR baseline trained on the same target data.

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association (ISCA)
Pages1086-1090
Number of pages5
ISBN (Electronic)9781713836902
DOIs
Publication statusPublished - 2021
MoE publication typeA4 Conference publication
EventInterspeech - Brno, Czech Republic
Duration: 30 Aug 20213 Sept 2021
Conference number: 22

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech
Abbreviated titleINTERSPEECH
Country/TerritoryCzech Republic
CityBrno
Period30/08/202103/09/2021

Funding

This work is part of Digitala project which is funded by the Academy of Finland (grant numbers 322619, 322625, 322965). The computational resources were provided by Aalto ScienceIT. This work is part of Digitala project which is funded by the Academy of Finland (grant numbers 322619, 322625, 322965). The computational resources were provided by Aalto Scien-ceIT.

Keywords

  • End-to-End L2 ASR
  • Nonnative ASR
  • Self-supervised

Fingerprint

Dive into the research topics of 'Self-supervised end-to-end ASR for low resource L2 Swedish'. Together they form a unique fingerprint.
  • DigiTala: Aka-Digi Tala

    Kurimo, M. (Principal investigator), Getman, Y. (Project Member), Voskoboinik, E. (Project Member) & Al-Ghezi, R. (Project Member)

    01/01/202031/08/2023

    Project: RCF Academy Project targeted call

  • Science-IT

    Hakala, M. (Manager)

    School of Science

    Facility/equipment: Facility

Cite this