Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on Northern Sámi

Yaroslav Getman, Tamás Grósz, Katri Hiovain-Asikainen, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

31 Downloads (Pure)

Abstract

Speech foundation models such as wav2vec 2.0 have made it possible to develop highly accurate models for low-resourced languages using a limited amount of speech data. For optimal results, the pre-training should already include data from the target language, but unfortunately, none of the available foundation models include Northern Sámi. In this work, we explore various ways of preparing the foundation model for the Northern Sámi, including continued pre-training with a small untranscribed corpus and our new extended fine-tuning method. The extended fine-tuning starts from an already fine-tuned ASR model and augments it with new output units for the unique Sámi characters before new fine-tuning with transcribed Sámi data. Our results demonstrate the benefits of these advanced adaptation techniques, as both approaches lead to better performance than the direct fine-tuning-based adaptation.

Original languageEnglish
Title of host publicationInterspeech 2024
PublisherInternational Society for Computers and Their Applications (ISCA)
Pages2539-2543
Number of pages5
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventInterspeech - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association (ISCA)
ISSN (Print)2308-457X

Conference

ConferenceInterspeech
Country/TerritoryGreece
CityKos Island
Period01/09/202405/09/2024

Keywords

  • ASR
  • low-resource
  • model adaptation
  • Northern Sámi
  • wav2vec2

Fingerprint

Dive into the research topics of 'Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on Northern Sámi'. Together they form a unique fingerprint.

Cite this