North Sámi morphological segmentation with low-resource semi-supervised sequence labeling

Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

51 Downloads (Pure)

Abstract

Semi-supervised sequence labeling is an effective way to train a low-resource
morphological segmentation system. We show that a feature set augmentation
approach, which combines the strengths of generative and discriminative mod-
els, is suitable both for graphical models like conditional random field (CRF) and
sequence-to-sequence neural models. We perform a comparative evaluation be-
tween three existing and one novel semi-supervised segmentation methods. All
four systems are language-independent and have open-source implementations.
We improve on previous best results for North Sámi morphological segmentation.
We see a relative improvement in morph boundary F 1 -score of 8.6% compared
to using the generative Morfessor FlatCat model directly and 2.4% compared to a
seq2seq baseline. Our neural sequence tagging system reaches almost the same
performance as the CRF topline.
Original languageEnglish
Title of host publicationFifth Workshop on Computational Linguistics for Uralic Languages
Subtitle of host publicationProceedings of the Workshop
Pages15-26
Number of pages12
ISBN (Electronic)978-1-948087-92-6
Publication statusPublished - 7 Jan 2019
MoE publication typeA4 Article in a conference publication
EventInternational Workshop on Computational Linguistics for Uralic Languages - University of Tartu, Tartu, Estonia
Duration: 7 Jan 20198 Jan 2019

Workshop

WorkshopInternational Workshop on Computational Linguistics for Uralic Languages
Abbreviated titleIWCLUL
Country/TerritoryEstonia
CityTartu
Period07/01/201908/01/2019

Keywords

  • morphology
  • segmentation
  • low-resource settings
  • semi-supervised learning
  • sequence labeling
  • recurrent neural networks
  • conditional random fields
  • north sami

Fingerprint

Dive into the research topics of 'North Sámi morphological segmentation with low-resource semi-supervised sequence labeling'. Together they form a unique fingerprint.

Cite this