North Sámi morphological segmentation with low-resource semi-supervised sequence labeling

Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

75 Lataukset (Pure)

Abstrakti

Semi-supervised sequence labeling is an effective way to train a low-resource
morphological segmentation system. We show that a feature set augmentation
approach, which combines the strengths of generative and discriminative mod-
els, is suitable both for graphical models like conditional random field (CRF) and
sequence-to-sequence neural models. We perform a comparative evaluation be-
tween three existing and one novel semi-supervised segmentation methods. All
four systems are language-independent and have open-source implementations.
We improve on previous best results for North Sámi morphological segmentation.
We see a relative improvement in morph boundary F 1 -score of 8.6% compared
to using the generative Morfessor FlatCat model directly and 2.4% compared to a
seq2seq baseline. Our neural sequence tagging system reaches almost the same
performance as the CRF topline.
AlkuperäiskieliEnglanti
OtsikkoFifth Workshop on Computational Linguistics for Uralic Languages
AlaotsikkoProceedings of the Workshop
KustantajaAssociation for Computational Linguistics
Sivut15-26
Sivumäärä12
ISBN (elektroninen)978-1-948087-92-6
TilaJulkaistu - 7 tammik. 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaInternational Workshop on Computational Linguistics for Uralic Languages - University of Tartu, Tartu, Viro
Kesto: 7 tammik. 20198 tammik. 2019

Workshop

WorkshopInternational Workshop on Computational Linguistics for Uralic Languages
LyhennettäIWCLUL
Maa/AlueViro
KaupunkiTartu
Ajanjakso07/01/201908/01/2019

Sormenjälki

Sukella tutkimusaiheisiin 'North Sámi morphological segmentation with low-resource semi-supervised sequence labeling'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä