Hiding in Plain Sight: Poetry in Newspapers and How to Approach It

Agata Dominowska, Elsi Hyttinen, Péter Ivanics, Mikko Koho, Ilona Pikkanen, Risto Turunen

Research output: Contribution to journalArticleScientificpeer-review

15 Downloads (Pure)

Abstract

In this paper, we discuss potential directions and implications of a short research project that set out to detect fictional content in digitised historical newspaper archive of the National Library of Finland. Our endeavour was motivated by and oriented around by an overarching question: what would literary history look like if, instead of focusing solely on canonical books, we accounted for the works published in journals and newspapers as well? For pragmatic reasons, we decided to focus on poetry and use a supervised machine learning approach. The lack of metadata that would denote content structure and content type posed the biggest challenge, but in the final dataset the poetry content had risen to a total of 18,591 text blocks with poetic content, with overall precision rates verging on 90 %. We argue that even these preliminary results demonstrate that studying works of fiction found in newspapers is a task worth undertaking. Moreover, the corpus extracted can already enable content-oriented research and we discuss some methods enabling this in the article. Finally, our paper suggests that a data-rich history of Finnish newspaper literature is an attainable goal in time, and it has great potential for challenging the current understanding of the Finnish literary past.
Original languageEnglish
Pages (from-to)145-171
Number of pages27
JournalHUMAN IT: TIDSKRIFT FÖR STUDIER AV IT UR ETT HUMANVETENSKAPLIGT PERSPEKTIV
Publication statusPublished - 2 Jul 2019
MoE publication typeA1 Journal article-refereed

Keywords

  • literary history
  • national literature
  • poetry
  • Finland
  • genre-detection
  • nineteenth century

Fingerprint Dive into the research topics of 'Hiding in Plain Sight: Poetry in Newspapers and How to Approach It'. Together they form a unique fingerprint.

Cite this