TY - JOUR
T1 - FinnPos
T2 - an open-source morphological tagging and lemmatization toolkit for Finnish
AU - Silfverberg, Miikka
AU - Ruokolainen, Teemu
AU - Lindén, Krister
AU - Kurimo, Mikko
PY - 2016/12/1
Y1 - 2016/12/1
N2 - This paper describes FinnPos, an open-source morphological tagging and lemmatization toolkit for Finnish. The morphological tagging model is based on the averaged structured perceptron classifier. Given training data, new taggers are estimated in a computationally efficient manner using a combination of beam search and model cascade. The lemmatization is performed employing a combination of a rule-based morphological analyzer, OMorFi, and a data-driven lemmatization model. The toolkit is readily applicable for tagging and lemmatization of running text with models learned from the recently published Finnish Turku Dependency Treebank and FinnTreeBank. Empirical evaluation on these corpora shows that FinnPos performs favorably compared to reference systems in terms of tagging and lemmatization accuracy. In addition, we demonstrate that our system is highly competitive with regard to computational efficiency of learning new models and assigning analyses to novel sentences.
AB - This paper describes FinnPos, an open-source morphological tagging and lemmatization toolkit for Finnish. The morphological tagging model is based on the averaged structured perceptron classifier. Given training data, new taggers are estimated in a computationally efficient manner using a combination of beam search and model cascade. The lemmatization is performed employing a combination of a rule-based morphological analyzer, OMorFi, and a data-driven lemmatization model. The toolkit is readily applicable for tagging and lemmatization of running text with models learned from the recently published Finnish Turku Dependency Treebank and FinnTreeBank. Empirical evaluation on these corpora shows that FinnPos performs favorably compared to reference systems in terms of tagging and lemmatization accuracy. In addition, we demonstrate that our system is highly competitive with regard to computational efficiency of learning new models and assigning analyses to novel sentences.
KW - Averaged perceptron
KW - Data-driven lemmatization
KW - Finnish
KW - Morphological tagging
KW - Open-source
UR - http://www.scopus.com/inward/record.url?scp=84949751796&partnerID=8YFLogxK
U2 - 10.1007/s10579-015-9326-3
DO - 10.1007/s10579-015-9326-3
M3 - Article
AN - SCOPUS:84949751796
VL - 50
SP - 863
EP - 878
JO - LANGUAGE RESOURCES AND EVALUATION
JF - LANGUAGE RESOURCES AND EVALUATION
SN - 1574-020X
IS - 4
ER -