Word embeddings for morphologically rich languages

Pyry Takala*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)

Abstract

Word-embedding models commonly treat words as unique symbols, for which a lower-dimensional embedding can be looked up. These representations generalize poorly with morphologically rich languages, as vectors for all possible inections cannot be stored, and words with the same stem do not share a similar representation. We study alternative representations for words, including one subword-model and two character-based models. Our methods outperform classical word embed-dings for a morphologically rich language, Finnish, on tasks requiring sophisticated understanding of grammar and context. Our embeddings are easier to implement than previously proposed methods, and can be used to form word-representations for any common language processing tasks.

Original languageEnglish
Title of host publicationESANN 2016 - 24th European Symposium on Artificial Neural Networks
Pages177-182
Number of pages6
ISBN (Electronic)9782875870278
Publication statusPublished - 2016
MoE publication typeA4 Article in a conference publication
Event European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - Bruges, Belgium
Duration: 27 Apr 201629 Apr 2016
Conference number: 24

Conference

Conference European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Abbreviated titleESANN
CountryBelgium
CityBruges
Period27/04/201629/04/2016

Cite this