Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

82 Citations (Scopus)
673 Downloads (Pure)

Abstract

Collecting data is one of the bottlenecks of Human-Computer Interaction (HCI) research. Motivated by this, we explore the potential of large language models (LLMs) in generating synthetic user research data.We use OpenAI’s GPT-3 model to generate open-ended questionnaire responses about experiencing video games as art, a topic not tractable with traditional computational user models. We test whether synthetic responses can be distinguished from real responses, analyze errors of synthetic data, and investigate content similarities between synthetic and real data. We conclude that GPT-3 can, in this context, yield believable accounts of HCI experiences. Given the low cost and high speed of LLM data generation, synthetic data should be useful in ideating and piloting new experiments, although any fndings must obviously always be validated with real data. The results also raise concerns: if employed by malicious users of crowdsourcing services, LLMs may make crowdsourcing of self-report data fundamentally unreliable.
Original languageEnglish
Title of host publicationProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23)
PublisherACM
Number of pages19
ISBN (Electronic)978-1-4503-9421-5
DOIs
Publication statusPublished - 19 Apr 2023
MoE publication typeA4 Conference publication
EventACM SIGCHI Annual Conference on Human Factors in Computing Systems - Hamburg, Germany
Duration: 23 Apr 202328 Apr 2023
https://chi2023.acm.org/

Conference

ConferenceACM SIGCHI Annual Conference on Human Factors in Computing Systems
Abbreviated titleACM CHI
Country/TerritoryGermany
CityHamburg
Period23/04/202328/04/2023
Internet address

Fingerprint

Dive into the research topics of 'Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study'. Together they form a unique fingerprint.

Cite this