Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

42 Lataukset (Pure)

Abstrakti

Speech technology applications for major languages are becoming widely available, but for many other languages there is no commercial interest in developing speech technology. As the lack of technology and applications will threaten the existence of these languages, it is important to study how to create speech recognizers with minimal effort and low resources.
As a test case, we have developed a Large Vocabulary Continuous Speech Recognizer for Northern Sámi, an Finno-Ugric language that has little resources for speech technology available. Using only limited audio data, 2.5 hours, and the Northern Sámi Wikipedia for the language model we achieved 7.6% Letter Error Rate (LER). With a language model based on a higher quality language corpus we achieved 4.2% LER. To put this in perspective we also trained systems in other, better-resourced, Finno-Ugric languages (Finnish and Estonian) with the same amount of data and compared those to state-of-the-art systems in those languages.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the Second International Workshop on Computational Linguistics for Uralic Languages
JulkaisupaikkaSzeged, Hungary
KustantajaUniversity of Szeged
Sivut80-91
Sivumäärä11
ISBN (elektroninen)978-963-306-504-4
TilaJulkaistu - 20 tammikuuta 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Workshop on Computational Linguistics for the Uralic Languages - Szeged, Unkari
Kesto: 20 tammikuuta 201621 tammikuuta 2016
Konferenssinumero: 2
http://rgai.inf.u-szeged.hu/iwclul2016

Workshop

WorkshopInternational Workshop on Computational Linguistics for the Uralic Languages
LyhennettäIWCLUL
MaaUnkari
KaupunkiSzeged
Ajanjakso20/01/201621/01/2016
www-osoite

Sormenjälki Sukella tutkimusaiheisiin 'Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

  • Laitteet

    Science-IT

    Mikko Hakala (Manager)

    Perustieteiden korkeakoulu

    Laitteistot/tilat: Facility

  • Siteeraa tätä

    Smit, P., Leinonen, J., Jokinen, K., & Kurimo, M. (2016). Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages. teoksessa Proceedings of the Second International Workshop on Computational Linguistics for Uralic Languages (Sivut 80-91). [9] Szeged, Hungary: University of Szeged.