Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages

Peter Smit, Juho Leinonen, Kristiina Jokinen, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

142 Downloads (Pure)

Abstract

Speech technology applications for major languages are becoming widely available, but for many other languages there is no commercial interest in developing speech technology. As the lack of technology and applications will threaten the existence of these languages, it is important to study how to create speech recognizers with minimal effort and low resources.
As a test case, we have developed a Large Vocabulary Continuous Speech Recognizer for Northern Sámi, an Finno-Ugric language that has little resources for speech technology available. Using only limited audio data, 2.5 hours, and the Northern Sámi Wikipedia for the language model we achieved 7.6% Letter Error Rate (LER). With a language model based on a higher quality language corpus we achieved 4.2% LER. To put this in perspective we also trained systems in other, better-resourced, Finno-Ugric languages (Finnish and Estonian) with the same amount of data and compared those to state-of-the-art systems in those languages.
Original languageEnglish
Title of host publicationProceedings of the Second International Workshop on Computational Linguistics for Uralic Languages
Place of PublicationSzeged, Hungary
PublisherUniversity of Szeged
Pages80-91
Number of pages11
ISBN (Electronic)978-963-306-504-4
Publication statusPublished - 20 Jan 2016
MoE publication typeA4 Conference publication
EventInternational Workshop on Computational Linguistics for the Uralic Languages - Szeged, Hungary
Duration: 20 Jan 201621 Jan 2016
Conference number: 2
http://rgai.inf.u-szeged.hu/iwclul2016

Workshop

WorkshopInternational Workshop on Computational Linguistics for the Uralic Languages
Abbreviated titleIWCLUL
Country/TerritoryHungary
CitySzeged
Period20/01/201621/01/2016
Internet address

Fingerprint

Dive into the research topics of 'Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages'. Together they form a unique fingerprint.

Cite this