Evaluating Morphological Generalisation in Machine Translation by Distribution-Based Compositionality Assessment

Anssi Moisio, Mathias Creutz, Mikko Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

31 Downloads (Pure)

Abstract

Compositional generalisation refers to the ability to understand and generate a potentially infinite number of novel meanings using a finite group of known primitives and a set of rules to combine them. The degree to which artificial neural networks can learn this ability is an open question. Recently, some evaluation methods and benchmarks have been proposed to test compositional generalisation, but not many have focused on the morphological level of language. We propose an application of the previously developed distribution-based compositionality assessment method to assess morphological generalisation in NLP tasks, such as machine translation or paraphrase detection. We demonstrate the use of our method by comparing translation systems with different BPE vocabulary sizes. The evaluation method we propose suggests that small vocabularies help with morphological generalisation in NMT.
Original languageEnglish
Title of host publicationProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
PublisherTartu Ülikool
Number of pages14
ISBN (Electronic)978-9916-21-999-7
Publication statusPublished - 24 May 2023
MoE publication typeA4 Conference publication
EventNordic Conference on Computational Linguistics - Tórshavn, Faroe Islands
Duration: 22 May 202324 May 2023
Conference number: 24

Publication series

NameNEALT proceedings series
PublisherUniversity of Tartu Library
Volume52
ISSN (Print)1736-8197
ISSN (Electronic)1736-6305

Conference

ConferenceNordic Conference on Computational Linguistics
Abbreviated titleNoDaLiDa
Country/TerritoryFaroe Islands
CityTórshavn
Period22/05/202324/05/2023

Fingerprint

Dive into the research topics of 'Evaluating Morphological Generalisation in Machine Translation by Distribution-Based Compositionality Assessment'. Together they form a unique fingerprint.

Cite this