New data, benchmark and baseline for L2 speaking assessment for low-resource languages

Mikko Kurimo, Yaroslav Getman, Ekaterina Voskoboinik, Ragheb Al-Ghezi, Heini Kallio, Mikko Kuronen, Anna von Zansen, Raili Hilden, Sirkku Kronholm, Ari Huhta, Krister Lindén

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

76 Downloads (Pure)

Abstract

The development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.
Original languageEnglish
Title of host publicationProceedings of 9th Workshop on Speech and Language Technology in Education (SLaTE)
PublisherInternational Speech Communication Association (ISCA)
Pages166-170
Number of pages5
DOIs
Publication statusPublished - 2023
MoE publication typeA4 Conference publication
EventWorkshop on Speech and Language Technology in Education - Dublin, Ireland
Duration: 18 Aug 202320 Aug 2023

Publication series

NameISCA International Workshop on Speech and Language Technology in Education
ISSN (Electronic)2311-4975

Workshop

WorkshopWorkshop on Speech and Language Technology in Education
Abbreviated titleSLaTE
Country/TerritoryIreland
CityDublin
Period18/08/202320/08/2023

Keywords

  • Educational sciences
  • suullinen kielitaito
  • kielitaidon arviointi
  • oral language skills
  • language assessment
  • Electronic
  • automation and communications engineering
  • electronics
  • puheentunnistus
  • automaattinen puheen arviointi
  • automatic speech recognition
  • automatic speaking assessment

Fingerprint

Dive into the research topics of 'New data, benchmark and baseline for L2 speaking assessment for low-resource languages'. Together they form a unique fingerprint.
  • DigiTala: Aka-Digi Tala

    Kurimo, M. (Principal investigator), Getman, Y. (Project Member), Voskoboinik, E. (Project Member) & Al-Ghezi, R. (Project Member)

    01/01/202031/08/2023

    Project: Academy of Finland: Other research funding

Cite this