Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition

Wei Sun*, Shaoxiong Ji, Tuulia Denti, Hans Moen, Oleg Kerro, Antti Rannikko, Pekka Marttinen, Miika Koskinen

*Tämän työn vastaava kirjoittaja

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

5 Lataukset (Pure)


One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at https://github.com/VRCMF/HAM-net.git.

OtsikkoMachine Learning and Knowledge Discovery in Databases
AlaotsikkoApplied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings
ToimittajatGianmarco De Francisci Morales, Francesco Bonchi, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis
ISBN (painettu)978-3-031-43426-6
DOI - pysyväislinkit
TilaJulkaistu - 2023
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Turin, Italia
Kesto: 18 syysk. 202322 syysk. 2023


NimiLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vuosikerta14174 LNAI
ISSN (painettu)0302-9743
ISSN (elektroninen)1611-3349


ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
LyhennettäECML PKDD


Sukella tutkimusaiheisiin 'Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä