Use of Self-Supervised Learning in Automated Speaking Scoring for Low Resource Languages

Julkaisun otsikon käännös: Use of Self-Supervised Learning in Automated Speaking Scoring for Low Resource Languages

Ragheb Al-Ghezi

Tutkimustuotos: Doctoral ThesisCollection of Articles

Abstrakti

Developing automatic systems for assessing speaking proficiency has become increasingly important in second language learning, as it facilitates self-regulated learning and serves as a valuable tool for language proficiency assessment and teacher training programs. However, such systems have primarily been designed for languages with many learners, benefiting from abundanthuman-transcribed and speech-scored training data. In contrast, languages with fewer learners, such as Finnish and Swedish, face significant challenges due to the limited availability of training data. Nevertheless, recent advancements in AI, particularly in self-supervised machine learning, offer the possibility of developing automatic speech recognition systems even with constrained training data, making it feasible to create automatic speaking assessment systems for underresourced languages. This dissertation investigates the potential of a self-supervised speech model, specifically Wav2vec2, to develop automatic speech recognition (ASR) and automated scoring models for second language (L2) young Swedish and Finnish, L2 child Swedish and Finnish, and native Swedish children with speech sound disorders (SSD). Results include that finetuning the monolingual Swedish Wav2vec2 model for ASR achieved 7% relative improvement in word error rate (WER) using only 5.6 hrs of training data compared to traditional ASR pipeline without using an external language model or customized pronunciation dictionaries. In addition, Wav2vec2 models were also shown to adapt to holistic speaking proficiency tasks when finetuned directly to predict proficiency levels or incorporated in a multitasking system, capable of decoding spoken utterances and predicting ratings concurrently. Furthermore, deep latent representations (embeddings) extracted from ASR-finetuned Wav2vec2 were shown to predict holistic proficiency of L2 Finnish and Swedish, yielding 20% improvement in F1 score relative to the pre-trained embeddings and manually-crafted features. The dissertation also presents an experimental evaluation of analytical models assessing components of spontaneous speaking proficiency, such as pronunciation, fluency, and lexicogrammatical proficiency, yielding human-machine agreement comparable to that of humanhuman inter-rater agreement. In short, finetuned ASR models facilitated the design and implementation of automated read-aloud and spontaneous speaking rating models for the aforementioned low resource tasks.
Julkaisun otsikon käännösUse of Self-Supervised Learning in Automated Speaking Scoring for Low Resource Languages
AlkuperäiskieliEnglanti
PätevyysTohtorintutkinto
Myöntävä instituutio
  • Aalto-yliopisto
Valvoja/neuvonantaja
  • Kurimo, Mikko, Vastuuprofessori
Kustantaja
Painoksen ISBN978-952-64-1862-9
Sähköinen ISBN978-952-64-1863-6
TilaJulkaistu - 2024
OKM-julkaisutyyppiG5 Artikkeliväitöskirja

Sormenjälki

Sukella tutkimusaiheisiin 'Use of Self-Supervised Learning in Automated Speaking Scoring for Low Resource Languages'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä