Unsupervised morph segmentation and statistical language models for vocabulary expansion

Matti Varjokallio, Dietrich Klakow

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This work explores the use of unsupervised morph segmentation along with statistical language models for the task of vocabulary expansion. Unsupervised vocabulary expansion has large potential for improving vocabulary coverage and performance in different natural language processing tasks, especially in less resourced settings on morphologically rich languages. We propose a combination of unsupervised morph segmentation and statistical language models and evaluate on languages from the Babel corpus. The method is shown to perform well for all the evaluated languages when compared to the previous work on the task.
Original languageEnglish
Title of host publicationProceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Pages175-180
Number of pages6
Volume2
ISBN (Electronic)978-1-945626-01-2
Publication statusPublished - 2016
MoE publication typeA4 Article in a conference publication
EventAnnual Meeting of the Association for Computational Linguistics - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016
Conference number: 54

Publication series

NameAnnual Meeting of the Association for Computational Linguistics
PublisherASSOCIATION FOR COMPUTATIONAL LINGUISTICS
ISSN (Print)0736-587X

Conference

ConferenceAnnual Meeting of the Association for Computational Linguistics
Abbreviated titleACL
CountryGermany
CityBerlin
Period07/08/201612/08/2016

Cite this