Dataless piloting: Developing context-sensitive computational approaches to online social discourses

Henna Paakki*, Kaisla Kajava, Minttu Tikka

*Corresponding author for this work

Research output: Contribution to conferenceAbstractScientificpeer-review

Abstract

This work introduces a light piloting phase into the development of novel theory-driven computational models. Computational models are increasingly used for understanding social data. However, their emphasis is often on technical aspects, whereas the theoretical basis for understanding social phenomena and language can be scarce. This lack of theory may lead to a lack of interpretation of the meanings behind social data (Lindgren 2020) that are often dependent on a specific context (Gumperz & Levinson, 1996).

Developing novel computational models that draw inspiration from theory and a specific empirical phenomenon can require a lot of resources: data collection, annotation work, time and expertise. We suggest that a dataless piloting phase can be useful for research investigating social phenomena through a theory-led computational lens. Piloting a novel theory-based computational model using dataless machine learning can enable better consideration of context-specific needs, and reveal possible biases (Caliskan et al., 2017; Weidinger et al., 2022) or problems of the model at an early phase of research.

We illustrate the piloting phase by utilizing 0-shot Natural Language Inference (NLI) (Yin et al., 2019). NLI is a state-of-the-art computational approach that can be used for analyzing diverse textual social discourses. We draw from two of our past and ongoing studies on online harassment and crisis discourses. We show how a theoretical frame together with qualitative analysis can inform and justify model design and category selection for data classification or computational analysis. However, theory or context-led modeling may face the problem that there exist no suitable computational models that share our theoretical assumptions, nor suitable annotated datasets for training a model. We argue that 0-shot NLI offers a solution to these problems, as it allows us to automatically label data even with novel categories for which we do not have annotated training data. Although 0-shot NLI might not be as accurate as a model trained for a specific purpose, it is quite fast to set up for a pilot, and in some cases may suffice for the intended empirical analysis. We test and evaluate our pilot models to reveal potential caveats, and suggest methods to mitigate them in later development phases. Finally, we present a step-by-step piloting process for evaluating central parts of model design: application of theory, technical metrics, and related ethical aspects.

We argue that a close-knit synthesis of theoretical insights, qualitative research and computational modeling can provide insightful perspectives on social phenomena online. Bridging between these requires non-trivial considerations of theory and empirical context, as well as their interpretation in computational model design. We state that a dataless pilot phase can help to account for these requirements: it enables early evaluation and redesign of the model, adjustments in the application of theory, saving of resources, and a basis for robust models that are rooted in a well-researched theoretical frame. Piloting can thus promote more sustainable and ethical design by enabling less biased models (Hutchinson et al., 2021) and fairness in the interpretation of social data.
Original languageEnglish
Pages74-75
Number of pages2
Publication statusPublished - 1 Dec 2022
MoE publication typeNot Eligible
EventDigital Research Data and Human Sciences: Diversity of Methods and Materials - University of Jyväskylä, Jyväskylä, Finland
Duration: 1 Dec 20223 Dec 2022
https://www.jyu.fi/en/congress/drdhum2022

Conference

ConferenceDigital Research Data and Human Sciences
Abbreviated titleDRDHum
Country/TerritoryFinland
CityJyväskylä
Period01/12/202203/12/2022
Internet address

Keywords

  • piloting
  • Natural Language Processing
  • Social Computing
  • theory-based models
  • social media
  • contextual interpretation

Fingerprint

Dive into the research topics of 'Dataless piloting: Developing context-sensitive computational approaches to online social discourses'. Together they form a unique fingerprint.

Cite this