An Anonymization Tool for Open Data Publication of Legal Documents

Arttu Oksanen, Eero Hyvönen, Minna Tamper, Jouni Tuominen, Henna Ylimaa, Katja Löytynoja, Matti Kokkonen, Aki Hietanen

Research output: Contribution to journalConference articleScientificpeer-review

1 Citation (Scopus)
87 Downloads (Pure)

Abstract

The EU General Data Protection Regulation (GDPR) requires anonymization of documents containing personal data, such as court decisions, for public use. Doing this manually is costly and time-consuming but can be automated by applying Natural Language Processing (NLP) methods. This paper introduces the ANOPPI tool developed for (semi-)automatic anonymization of Finnish texts. The tool can be used both as a web application and programmatically through a REST API. Evaluation shows that ANOPPI performs well with different types of documents, however, further improving the performance of the named entity recognition and disambiguation methods would enhance the usefulness of the software. The tool is being published as open source for public use by the Ministry of Justice in Finland. A use case of ANOPPI is to publish court decisions on the Web in the LawSampo semantic portal for human close reading and as Linked Open Data for data analysis in legal informatics.

Original languageEnglish
Pages (from-to)12-21
Number of pages10
JournalCEUR Workshop Proceedings
Volume3257
Publication statusPublished - 2022
MoE publication typeA4 Conference publication
Event3rd International Workshop on Artificial Intelligence Technologies for Legal Documents and the 1st International Workshop on Knowledge Graph Summarization - Virtual, Online, Hangzhou, China
Duration: 23 Oct 202224 Oct 2022

Keywords

  • anonymization
  • case law
  • named entity recognition
  • pseudonymization

Fingerprint

Dive into the research topics of 'An Anonymization Tool for Open Data Publication of Legal Documents'. Together they form a unique fingerprint.

Cite this