Projekteja vuodessa
Abstrakti
Semi-supervised sequence labeling is an effective way to train a low-resource
morphological segmentation system. We show that a feature set augmentation
approach, which combines the strengths of generative and discriminative mod-
els, is suitable both for graphical models like conditional random field (CRF) and
sequence-to-sequence neural models. We perform a comparative evaluation be-
tween three existing and one novel semi-supervised segmentation methods. All
four systems are language-independent and have open-source implementations.
We improve on previous best results for North Sámi morphological segmentation.
We see a relative improvement in morph boundary F 1 -score of 8.6% compared
to using the generative Morfessor FlatCat model directly and 2.4% compared to a
seq2seq baseline. Our neural sequence tagging system reaches almost the same
performance as the CRF topline.
morphological segmentation system. We show that a feature set augmentation
approach, which combines the strengths of generative and discriminative mod-
els, is suitable both for graphical models like conditional random field (CRF) and
sequence-to-sequence neural models. We perform a comparative evaluation be-
tween three existing and one novel semi-supervised segmentation methods. All
four systems are language-independent and have open-source implementations.
We improve on previous best results for North Sámi morphological segmentation.
We see a relative improvement in morph boundary F 1 -score of 8.6% compared
to using the generative Morfessor FlatCat model directly and 2.4% compared to a
seq2seq baseline. Our neural sequence tagging system reaches almost the same
performance as the CRF topline.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Fifth Workshop on Computational Linguistics for Uralic Languages |
Alaotsikko | Proceedings of the Workshop |
Kustantaja | Association for Computational Linguistics |
Sivut | 15-26 |
Sivumäärä | 12 |
ISBN (elektroninen) | 978-1-948087-92-6 |
Tila | Julkaistu - 7 tammik. 2019 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisuussa |
Tapahtuma | International Workshop on Computational Linguistics for Uralic Languages - University of Tartu, Tartu, Viro Kesto: 7 tammik. 2019 → 8 tammik. 2019 |
Workshop
Workshop | International Workshop on Computational Linguistics for Uralic Languages |
---|---|
Lyhennettä | IWCLUL |
Maa/Alue | Viro |
Kaupunki | Tartu |
Ajanjakso | 07/01/2019 → 08/01/2019 |
Sormenjälki
Sukella tutkimusaiheisiin 'North Sámi morphological segmentation with low-resource semi-supervised sequence labeling'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Projektit
- 1 Päättynyt
-
MeMAD: Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy
Kurimo, M., Grönroos, S., Brander, T., Porjazovski, D., Raitio, R., Rouhe, A., Grósz, T. & Virkkunen, A.
27/12/2017 → 31/03/2021
Projekti: EU: Framework programmes funding