TY - JOUR
T1 - Linked Data - A Paradigm Shift for Publishing and Using Biography Collections on the Semantic Web
AU - Hyvönen, Eero
AU - Leskinen, Petri
AU - Tamper, Minna
AU - Rantala, Heikki
AU - Ikkala, Esko
AU - Tuominen, Jouni
AU - Keravuori, Kirsi
N1 - Funding Information:
critical edition, genealogical data, and various biographical data sources and semantic portals online. Another difference is that in our work, a main goal has been to develop and provide versatile DH tooling for end-users on top of a Linked Data SPARQL endpoint. This paper presented and demonstrated the vision of a paradigm shift in publishing biography collections on the Semantic Web. The vision has also been operationalized and implemented as the semantic portal BIOGRAPHYSAMPO now in use on the Web by thousands of users. The biographical data of the portal was extracted and aggregated automatically by the computer and has not been fully validated by human experts, which would be impossible due to the amount and complexity of the big data. This is a typical situation in DH research, and calls for using more source criticism when interpreting the analyses than when dealing with human curated datasets. The quality and completeness of the BIOGRAPHYSAMPO data has not yet been analyzed formally, but our informal tests suggest that the results are very useful even if errors are also encountered. This is the price to be paid for advanced end-user services and distant reading on distributed heterogeneous biographical data. Acknowledgements This research was part of the Severi project26, funded mainly by Business Finland. Thanks to CSC – IT Center for Science, Finland, for computational server resources for the data service and applications.
Publisher Copyright:
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2022
Y1 - 2022
N2 - This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the Web, based on Linked Data. The idea is to represent biographical data in a harmonized, semantically interoperable form, which enables 1) data enrichment by aggregating linked content from complementary, distributed, and heterogeneous data sources, as well as by reasoning, and 2) development of intelligent services using machine “understandable” data. Based on the aggregated global knowledge graph, published in a SPARQL endpoint, tooling for 1) biographical research of individual persons as well as for 2) prosopographical research on groups of people can be provided. As a demonstration of these ideas, we discuss the new in-use linked data service and semantic portal BIOGRAPHYSAMPO - Finnish Biographies on the Semantic Web that quickly attracted thousands of end users on the Web. This semantic portal is based on a knowledge graph extracted automatically from a collection of 13 100 textual biographies, written by 980 scholars. The texts are enriched with data linking to 16 external data sources and by harvesting external collection data from libraries, museums, and archives. Reasoning is used for query expansion and for discovering serendipitous relations between entities, such as persons and places.
AB - This paper argues for making a paradigm shift in publishing and using biographical dictionaries on the Web, based on Linked Data. The idea is to represent biographical data in a harmonized, semantically interoperable form, which enables 1) data enrichment by aggregating linked content from complementary, distributed, and heterogeneous data sources, as well as by reasoning, and 2) development of intelligent services using machine “understandable” data. Based on the aggregated global knowledge graph, published in a SPARQL endpoint, tooling for 1) biographical research of individual persons as well as for 2) prosopographical research on groups of people can be provided. As a demonstration of these ideas, we discuss the new in-use linked data service and semantic portal BIOGRAPHYSAMPO - Finnish Biographies on the Semantic Web that quickly attracted thousands of end users on the Web. This semantic portal is based on a knowledge graph extracted automatically from a collection of 13 100 textual biographies, written by 980 scholars. The texts are enriched with data linking to 16 external data sources and by harvesting external collection data from libraries, museums, and archives. Reasoning is used for query expansion and for discovering serendipitous relations between entities, such as persons and places.
UR - http://www.scopus.com/inward/record.url?scp=85132264746&partnerID=8YFLogxK
UR - http://ceur-ws.org/Vol-3152/
M3 - Conference article
AN - SCOPUS:85132264746
SN - 1613-0073
VL - 3152
SP - 16
EP - 23
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - Biographical Data in a Digital World
Y2 - 5 September 2019 through 6 September 2019
ER -