    To understand and study society and culture, written language is crucial. During the last 30 years, many of new sources have become available to researchers as huge digital corpora, impossible to manually read through. These new sources enable research that was previously impossible, but also pose methodological challenges. How can we draw valid scientific conclusions from huge textual corpora about society and culture?

    The project will develop statistical theory, methods, and software for large textual data in close collaborations with researchers in Sociology and History. The project will use open research corpora, but also three unique and newly available corpora, (1) all articles in the four largest Swedish newspaper during 1945-2019, (2) all Swedish parliamentary proceedings 1945-2019, and when digitized, (3) all Swedish novels from 1945-1989.
