TY - JOUR
T1 - A comparative study of minimally supervised morphological segmentation
AU - Ruokolainen, Teemu
AU - Kohonen, Oskar
AU - Sirts, Kairit
AU - Grönroos, Stig Arne
AU - Kurimo, Mikko
AU - Virpioja, Sami
N1 - VK: Kaski, S.
PY - 2016/3/1
Y1 - 2016/3/1
N2 - This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-depth empirical comparison on three diverse model families, including a detailed error analysis. Based on the literature survey, we conclude that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection. As for which approach has been more successful, both the previous work and the empirical evaluation presented here strongly imply that the current state of the art is yielded by the discriminative boundary detection methodology.
AB - This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-depth empirical comparison on three diverse model families, including a detailed error analysis. Based on the literature survey, we conclude that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection. As for which approach has been more successful, both the previous work and the empirical evaluation presented here strongly imply that the current state of the art is yielded by the discriminative boundary detection methodology.
UR - http://www.scopus.com/inward/record.url?scp=84964048195&partnerID=8YFLogxK
U2 - 10.1162/COLI_a_00243
DO - 10.1162/COLI_a_00243
M3 - Article
SN - 0891-2017
VL - 42
SP - 91
EP - 120
JO - Computational Linguistics
JF - Computational Linguistics
IS - 1
ER -