In this paper, we discuss potential directions and implications of a short research project that set out to detect fictional content in digitised historical newspaper archive of the National Library of Finland. Our endeavour was motivated by and oriented around by an overarching question: what would literary history look like if, instead of focusing solely on canonical books, we accounted for the works published in journals and newspapers as well? For pragmatic reasons, we decided to focus on poetry and use a supervised machine learning approach. The lack of metadata that would denote content structure and content type posed the biggest challenge, but in the final dataset the poetry content had risen to a total of 18,591 text blocks with poetic content, with overall precision rates verging on 90 %. We argue that even these preliminary results demonstrate that studying works of fiction found in newspapers is a task worth undertaking. Moreover, the corpus extracted can already enable content-oriented research and we discuss some methods enabling this in the article. Finally, our paper suggests that a data-rich history of Finnish newspaper literature is an attainable goal in time, and it has great potential for challenging the current understanding of the Finnish literary past.
|Number of pages||27|
|Journal||HUMAN IT: TIDSKRIFT FÖR STUDIER AV IT UR ETT HUMANVETENSKAPLIGT PERSPEKTIV|
|Publication status||Published - 2 Jul 2019|
|MoE publication type||A1 Journal article-refereed|
- literary history
- national literature
- nineteenth century