Projects per year
Abstract
In this paper we present a Bidirectional LSTM neural network with a Conditional Random Field layer on top, which utilizes word, character and morph embeddings in order to perform named entity recognition on various Finnish datasets. To overcome the lack of annotated training corpora that arises when dealing with low-resource languages like Finnish, we tried a knowledge transfer technique to transfer tags from Estonian dataset. On the human annotated in-domain Digitoday dataset, out system achieved F1 score of 84.73. On the out-of-domain Wikipedia set we got F1 score of 67.66. In order to see how well the system performs on speech data, we used two datasets containing automatic speech recognition outputs. Since we do not have true labels for those datasets, we used a rule-based system to annotate them and used those annotations as reference labels. On the first dataset which contains Finnish parliament sessions we obtained F1 score of 42.09 and on the second one which contains talks from Yle Pressiklubi we obtained F1 score of 74.54.
Original language | English |
---|---|
Title of host publication | AI4TV 2020 - Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery |
Publisher | ACM |
Pages | 25-29 |
Number of pages | 5 |
ISBN (Electronic) | 9781450381468 |
DOIs | |
Publication status | Published - 12 Oct 2020 |
MoE publication type | A4 Conference publication |
Event | International Workshop on AI for Smart TV Content Production, Access and Delivery - Virtual, Online, United States Duration: 12 Oct 2020 → 12 Oct 2020 Conference number: 2 |
Workshop
Workshop | International Workshop on AI for Smart TV Content Production, Access and Delivery |
---|---|
Abbreviated title | AI4TV |
Country/Territory | United States |
City | Virtual, Online |
Period | 12/10/2020 → 12/10/2020 |
Keywords
- low-resource
- named entity recognition
- speech recognition
Fingerprint
Dive into the research topics of 'Named Entity Recognition for Spoken Finnish'. Together they form a unique fingerprint.Projects
- 2 Finished
-
-: Movie Making Finland: Finnish fiction films as audiovisual big data, 1907-2017
Kurimo, M. (Principal investigator), Virkkunen, A. (Project Member), Moisio, A. (Project Member), Porjazovski, D. (Project Member) & Kathania, H. (Project Member)
01/01/2020 → 31/12/2022
Project: Academy of Finland: Other research funding
-
MeMAD: Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy
Kurimo, M. (Principal investigator), Grönroos, S.-A. (Project Member), Brander, T. (Project Member), Porjazovski, D. (Project Member), Raitio, R. (Project Member), Grósz, T. (Project Member), Virkkunen, A. (Project Member) & Rouhe, A. (Project Member)
27/12/2017 → 31/03/2021
Project: EU: Framework programmes funding