Finding Nineteenth-century Berry Spots: Recognizing and Linking Place Names in a Historical Newspaper Berry-picking Corpus

Matti La Mela, Minna Tamper, Kimmo Kettunen

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

70 Lataukset (Pure)


The paper studies and improves methods of named entity recognition (NER) and linking (NEL) for facilitating historical research, which uses digitized newspaper texts. The specific focus is on a study about historical process of commodification. The named entity detection pipeline is discussed in three steps. First, the paper presents the corpus, which consists of newspaper articles on wild berry picking from the late nineteenth century. Second, the paper compares two named entity recognition tools: the trainable Stanford NER and the rule-based FiNER. Third, the linking and disambiguation of the recognized places is explored. In the linking process, information about the newspaper publication place is used to improve the identification of small places.
The paper concludes that the pipeline performs well for mapping the commodification, and that specific problems relate to the recognition of place names (among named entities). It is shown how Stanford NER performs better in the task (F-score of 0.83) than the FiNER tool (F-score of 0.68). Concerning the linking of places, the use of newspaper metadata appears useful for disambiguation between small places. However, the historical language (with its OCR errors) recognized by the Stanford model poses challenges for the linking tool. The paper proposes that other information, for instance about the reuse of the newspaper articles, could be used to further improve the recognition and linking quality.
OtsikkoDHN 2019 - Digital Humanities in the Nordic Countries
AlaotsikkoProceedings of the Digital Humanities in the Nordic Countries 4th Conference, Copenhagen, Denmark, March 5-8, 2019
ToimittajatCostanza Navarretta, Manex Agirrezabal, Bente Maegaard
TilaJulkaistu - 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaDigital Humanities in the Nordic Countries - University of Copenhagen, Copenhagen, Tanska
Kesto: 6 maalisk. 20198 maalisk. 2019
Konferenssinumero: 4


NimiCEUR Workshop Proceedings
ISSN (painettu)1613-0073
ISSN (elektroninen)1613-0073


ConferenceDigital Humanities in the Nordic Countries


Sukella tutkimusaiheisiin 'Finding Nineteenth-century Berry Spots: Recognizing and Linking Place Names in a Historical Newspaper Berry-picking Corpus'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä