Abstract
We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
Original language | English |
---|---|
Title of host publication | COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference |
Editors | Donia Scott, Nuria Bel, Chengqing Zong |
Publisher | Association for Computational Linguistics |
Pages | 6542-6552 |
Number of pages | 11 |
ISBN (Electronic) | 9781952148279 |
Publication status | Published - 2020 |
MoE publication type | A4 Conference publication |
Event | International Conference on Computational Linguistics - Virtual, Online, Spain Duration: 8 Dec 2020 → 13 Dec 2020 Conference number: 28 |
Conference
Conference | International Conference on Computational Linguistics |
---|---|
Abbreviated title | COLING |
Country/Territory | Spain |
City | Virtual, Online |
Period | 08/12/2020 → 13/12/2020 |