Projects per year
Abstract
Semi-supervised sequence labeling is an effective way to train a low-resource
morphological segmentation system. We show that a feature set augmentation
approach, which combines the strengths of generative and discriminative mod-
els, is suitable both for graphical models like conditional random field (CRF) and
sequence-to-sequence neural models. We perform a comparative evaluation be-
tween three existing and one novel semi-supervised segmentation methods. All
four systems are language-independent and have open-source implementations.
We improve on previous best results for North Sámi morphological segmentation.
We see a relative improvement in morph boundary F 1 -score of 8.6% compared
to using the generative Morfessor FlatCat model directly and 2.4% compared to a
seq2seq baseline. Our neural sequence tagging system reaches almost the same
performance as the CRF topline.
morphological segmentation system. We show that a feature set augmentation
approach, which combines the strengths of generative and discriminative mod-
els, is suitable both for graphical models like conditional random field (CRF) and
sequence-to-sequence neural models. We perform a comparative evaluation be-
tween three existing and one novel semi-supervised segmentation methods. All
four systems are language-independent and have open-source implementations.
We improve on previous best results for North Sámi morphological segmentation.
We see a relative improvement in morph boundary F 1 -score of 8.6% compared
to using the generative Morfessor FlatCat model directly and 2.4% compared to a
seq2seq baseline. Our neural sequence tagging system reaches almost the same
performance as the CRF topline.
Original language | English |
---|---|
Title of host publication | Fifth Workshop on Computational Linguistics for Uralic Languages |
Subtitle of host publication | Proceedings of the Workshop |
Publisher | Association for Computational Linguistics |
Pages | 15-26 |
Number of pages | 12 |
ISBN (Electronic) | 978-1-948087-92-6 |
Publication status | Published - 7 Jan 2019 |
MoE publication type | A4 Conference publication |
Event | International Workshop on Computational Linguistics for Uralic Languages - University of Tartu, Tartu, Estonia Duration: 7 Jan 2019 → 8 Jan 2019 |
Workshop
Workshop | International Workshop on Computational Linguistics for Uralic Languages |
---|---|
Abbreviated title | IWCLUL |
Country/Territory | Estonia |
City | Tartu |
Period | 07/01/2019 → 08/01/2019 |
Keywords
- morphology
- segmentation
- low-resource settings
- semi-supervised learning
- sequence labeling
- recurrent neural networks
- conditional random fields
- north sami
Fingerprint
Dive into the research topics of 'North Sámi morphological segmentation with low-resource semi-supervised sequence labeling'. Together they form a unique fingerprint.Projects
- 1 Finished
-
MeMAD: Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy
Kurimo, M. (Principal investigator), Grönroos, S.-A. (Project Member), Brander, T. (Project Member), Porjazovski, D. (Project Member), Raitio, R. (Project Member), Grósz, T. (Project Member), Virkkunen, A. (Project Member) & Rouhe, A. (Project Member)
27/12/2017 → 31/03/2021
Project: EU: Framework programmes funding