Extraction of Entities and Concepts from Finnish Texts.

Research output: ThesisMaster's thesis

Abstract

Keywords are used in many document databases to improve search. The process of assigning keywords from controlled vocabularies to a document is called subject indexing. If the controlled vocabulary used for indexing is an ontology, with semantic relations and descriptions of concepts, the process is also called semantic annotation. In this thesis an automatic annotation tool was created to provide the documents with semantic annotations. The application links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different Finnish texts. The application was developed as a part of WarSampo and Semantic Finlex projects and tested using Kansa Taisteli magazine articles and consolidated legislation of Finnish legislation. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.
Original languageEnglish
QualificationMaster's degree
Awarding Institution
  • School of Science
Supervisors/Advisors
  • Mäkelä, Eetu, Supervising Professor
  • Tuominen, Jouni, Supervising Professor
  • Hyvönen, Eero, Supervising Professor
Publication statusPublished - Dec 2016
MoE publication typeG2 Master's thesis, polytechnic Master's thesis

Fingerprint Dive into the research topics of 'Extraction of Entities and Concepts from Finnish Texts.'. Together they form a unique fingerprint.

Cite this