Service registration chatbot: collecting and comparing dialogues from AMT workers and service’s users

Luca Molteni, Mittul Singh, Juho Leinonen, Katri Leino, Mikko Kurimo, Emanuele Della Valle

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

8 Downloads (Pure)

Abstract

Crowdsourcing is the go-to solution for data collection and annotation in the context of NLP tasks. Nevertheless, crowdsourced data is noisy by nature; the source is often unknown and additional validation work is performed to guarantee the dataset’s quality. In this article, we compare two crowdsourcing sources on a dialogue paraphrasing task revolving around a chatbot service. We observe that workers hired on crowdsourcing platforms produce lexically poorer and less diverse rewrites than service users engaged voluntarily. Notably enough, on dialogue clarity and optimality, the two paraphrase sources’ human-perceived quality does not differ significantly. Furthermore, for the chatbot service, the combined crowdsourced data is enough to train a transformer-based Natural Language Generation (NLG) system. To enable similar services, we also release tools for collecting data and training the dialogue-act-based transformer-based NLG module.
Original languageEnglish
Title of host publicationProceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Pages116-121
Number of pages6
ISBN (Electronic)978-1-952148-76-7
DOIs
Publication statusPublished - Nov 2020
MoE publication typeA4 Article in a conference publication
EventConference on Empirical Methods in Natural Language Processing - Virtual, Online
Duration: 16 Nov 202020 Nov 2020

Conference

ConferenceConference on Empirical Methods in Natural Language Processing
CityVirtual, Online
Period16/11/202020/11/2020

Keywords

  • chatbot
  • NLP
  • Machine Talking to Machines
  • crowdsourcing

Fingerprint Dive into the research topics of 'Service registration chatbot: collecting and comparing dialogues from AMT workers and service’s users'. Together they form a unique fingerprint.

Cite this