Abstract
This work introduces a light piloting phase into the development of novel theory-driven computational models. Computational models are increasingly used for understanding social data. However, their emphasis is often on technical aspects, whereas the theoretical basis for understanding social phenomena and language can be scarce. This lack of theory may lead to a lack of interpretation of the meanings behind social data (Lindgren 2020) that are often dependent on a specific context (Gumperz & Levinson, 1996).
Developing novel computational models that draw inspiration from theory and a specific empirical phenomenon can require a lot of resources: data collection, annotation work, time and expertise. We suggest that a dataless piloting phase can be useful for research investigating social phenomena through a theory-led computational lens. Piloting a novel theory-based computational model using dataless machine learning can enable better consideration of context-specific needs, and reveal possible biases (Caliskan et al., 2017; Weidinger et al., 2022) or problems of the model at an early phase of research.
We illustrate the piloting phase by utilizing 0-shot Natural Language Inference (NLI) (Yin et al., 2019). NLI is a state-of-the-art computational approach that can be used for analyzing diverse textual social discourses. We draw from two of our past and ongoing studies on online harassment and crisis discourses. We show how a theoretical frame together with qualitative analysis can inform and justify model design and category selection for data classification or computational analysis. However, theory or context-led modeling may face the problem that there exist no suitable computational models that share our theoretical assumptions, nor suitable annotated datasets for training a model. We argue that 0-shot NLI offers a solution to these problems, as it allows us to automatically label data even with novel categories for which we do not have annotated training data. Although 0-shot NLI might not be as accurate as a model trained for a specific purpose, it is quite fast to set up for a pilot, and in some cases may suffice for the intended empirical analysis. We test and evaluate our pilot models to reveal potential caveats, and suggest methods to mitigate them in later development phases. Finally, we present a step-by-step piloting process for evaluating central parts of model design: application of theory, technical metrics, and related ethical aspects.
We argue that a close-knit synthesis of theoretical insights, qualitative research and computational modeling can provide insightful perspectives on social phenomena online. Bridging between these requires non-trivial considerations of theory and empirical context, as well as their interpretation in computational model design. We state that a dataless pilot phase can help to account for these requirements: it enables early evaluation and redesign of the model, adjustments in the application of theory, saving of resources, and a basis for robust models that are rooted in a well-researched theoretical frame. Piloting can thus promote more sustainable and ethical design by enabling less biased models (Hutchinson et al., 2021) and fairness in the interpretation of social data.
Developing novel computational models that draw inspiration from theory and a specific empirical phenomenon can require a lot of resources: data collection, annotation work, time and expertise. We suggest that a dataless piloting phase can be useful for research investigating social phenomena through a theory-led computational lens. Piloting a novel theory-based computational model using dataless machine learning can enable better consideration of context-specific needs, and reveal possible biases (Caliskan et al., 2017; Weidinger et al., 2022) or problems of the model at an early phase of research.
We illustrate the piloting phase by utilizing 0-shot Natural Language Inference (NLI) (Yin et al., 2019). NLI is a state-of-the-art computational approach that can be used for analyzing diverse textual social discourses. We draw from two of our past and ongoing studies on online harassment and crisis discourses. We show how a theoretical frame together with qualitative analysis can inform and justify model design and category selection for data classification or computational analysis. However, theory or context-led modeling may face the problem that there exist no suitable computational models that share our theoretical assumptions, nor suitable annotated datasets for training a model. We argue that 0-shot NLI offers a solution to these problems, as it allows us to automatically label data even with novel categories for which we do not have annotated training data. Although 0-shot NLI might not be as accurate as a model trained for a specific purpose, it is quite fast to set up for a pilot, and in some cases may suffice for the intended empirical analysis. We test and evaluate our pilot models to reveal potential caveats, and suggest methods to mitigate them in later development phases. Finally, we present a step-by-step piloting process for evaluating central parts of model design: application of theory, technical metrics, and related ethical aspects.
We argue that a close-knit synthesis of theoretical insights, qualitative research and computational modeling can provide insightful perspectives on social phenomena online. Bridging between these requires non-trivial considerations of theory and empirical context, as well as their interpretation in computational model design. We state that a dataless pilot phase can help to account for these requirements: it enables early evaluation and redesign of the model, adjustments in the application of theory, saving of resources, and a basis for robust models that are rooted in a well-researched theoretical frame. Piloting can thus promote more sustainable and ethical design by enabling less biased models (Hutchinson et al., 2021) and fairness in the interpretation of social data.
Original language | English |
---|---|
Pages | 74-75 |
Number of pages | 2 |
Publication status | Published - 1 Dec 2022 |
MoE publication type | Not Eligible |
Event | Digital Research Data and Human Sciences: Diversity of Methods and Materials - University of Jyväskylä, Jyväskylä, Finland Duration: 1 Dec 2022 → 3 Dec 2022 https://www.jyu.fi/en/congress/drdhum2022 |
Conference
Conference | Digital Research Data and Human Sciences |
---|---|
Abbreviated title | DRDHum |
Country/Territory | Finland |
City | Jyväskylä |
Period | 01/12/2022 → 03/12/2022 |
Internet address |
Keywords
- piloting
- Natural Language Processing
- Social Computing
- theory-based models
- social media
- contextual interpretation