An OCR Pipeline for Transforming Parliamentary Debates into Linked Data: Case ParliamentSampo – Parliament of Finland on the Semantic Web

Senka Drobac, Laura Sinikallio, Eero Hyvönen

Tutkimustuotos: LehtiartikkeliConference articleScientificvertaisarvioitu

34 Lataukset (Pure)

Abstrakti

This paper presents the OCR pipeline created for ParliamentSampo - Parliament of Finland on the Semantic Web, a Linked Open Data (LOD) service, data infrastructure, and semantic portal for studying Finnish political culture, language, and networks of the Members of Parliament (MP). A knowledge graph of linked data has been created based on ca. 967 000 speeches in all plenary sessions of the Parliament of Finland in 1907—2022; the data is also available in XML format, utilizing the new international Parla- CLARIN format. A central part of the historical debates 1907-1999 was available only as PDF documents of fairly low OCR quality and had to be OCRed first; this paper reports lessons learned from this process.
AlkuperäiskieliEnglanti
Sivut287-296
Sivumäärä10
JulkaisuDigital Humanities in the Nordic and Baltic Countries Publications
Vuosikerta5
Numero1
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaDigital Humanities in the Nordic and Baltic Countries - Virtual, Online, Norja
Kesto: 8 maalisk. 202310 maalisk. 2023
Konferenssinumero: 7

Sormenjälki

Sukella tutkimusaiheisiin 'An OCR Pipeline for Transforming Parliamentary Debates into Linked Data: Case ParliamentSampo – Parliament of Finland on the Semantic Web'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä