Description

This repository describes and stores text corpora of AI discourse between 2020-2024. The corpora were collected as part of the Civic Agency in AI (CAAI) project at the Department of Computer Science in Aalto University, funded by the Kone Foundation and Research Council of Finland.

 

The corpora constitute the data collected for and examined in the PhD thesis of DC Kaisla Kajava in the following publications:



1. Kajava, K; Öhman, E; Takagi, N M; Nakajima-Wickham, E; Vitiugin, F. 2025. Post-GPT Policy: Risk and Regulation in EU AI Discourse. In Proceedings of the International AAAI Conference on Web and Social Media. AAAI Press, Volume 19, pages 994-1006. eISSN 3067-1515. 10.1609/icwsm.v19i1.35856.

2. Anonymized, for peer review.

3. Kajava, K; Gonzalez Torres, A P; Rannisto, A; Sakai, S. 2025. Justifying AI Regulation: Examining Multi-Stakeholder Responses to the AI Act. Telematics and Informatics, Volume 99, June 2025, 102278. eISBN 1879-324X. 10.1016/j.tele.2025.102278.

4. Gonzalez Torres, A P; Kajava, K; Sawhney, N. 2023. Emerging AI Discourses and Policies in the EU: Implications for Evolving AI Governance. In: Pillay, Anban, Jembere, Edgar, Gerber, Aurona J. Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science. Springer, Volume 1976, pages 3-17. eISSN 1865-0937. 10.1007/978-3-031-49002-6_1.

5. Kajava, K; Sawhney, N. 2023. Language of Algorithms: Agency, Metaphors, and Deliberations in AI Discourses. In: Lindgren, Simon. Handbook of Critical Studies of Artificial Intelligence. Edward Elgar, pages 224–236. eISBN 9781803928562. 10.4337/9781803928562.00025.


The corpora compiled in the project are:

A corpus of 20 AI Watch reports 2020-2021 retrieved from and available for download at: https://ai-watch.ec.europa.eu/publications_en. (Publication 5).

A corpus of 128 feedback documents to the initial proposal of the EU AI Act (2021) by industry, civil society, public sector, and research organizations retrieved from and available for download at: https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/12527-Artificial-intelligence-ethical-and-legal-requirements/feedback_en?p_id=14488. (Publications 3, 4, 5).

A corpus of 806 AI policy-related documents, with 432 news articles and 374 other policy-related documents (2021-2023). The data is available at: https://github.com/vitiugin/eu-ai-discourse. (Publication 1).

A corpus <anonymized for peer review>. (Publication 2).


If you use these corpora in your research, please cite the relevant publication(s) listed above. Each corpus is described in detail in its respective publication.
Date made available21 Oct 2025
PublisherZenodo

Dataset Licences

  • CC-BY-4.0

Cite this