XED : A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

35 Citations (Scopus)

Abstract

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Original languageEnglish
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics
Pages6542-6552
Number of pages11
ISBN (Electronic)9781952148279
Publication statusPublished - 2020
MoE publication typeA4 Conference publication
EventInternational Conference on Computational Linguistics - Virtual, Online, Spain
Duration: 8 Dec 202013 Dec 2020
Conference number: 28

Conference

ConferenceInternational Conference on Computational Linguistics
Abbreviated titleCOLING
Country/TerritorySpain
CityVirtual, Online
Period08/12/202013/12/2020

Fingerprint

Dive into the research topics of 'XED : A Multilingual Dataset for Sentiment Analysis and Emotion Detection'. Together they form a unique fingerprint.

Cite this