Projekteja vuodessa
Abstrakti
One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at https://github.com/VRCMF/HAM-net.git.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Machine Learning and Knowledge Discovery in Databases |
Alaotsikko | Applied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings |
Toimittajat | Gianmarco De Francisci Morales, Francesco Bonchi, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis |
Kustantaja | Springer |
Sivut | 444-459 |
Sivumäärä | 16 |
ISBN (painettu) | 978-3-031-43426-6 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2023 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Turin, Italia Kesto: 18 syysk. 2023 → 22 syysk. 2023 |
Julkaisusarja
Nimi | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Vuosikerta | 14174 LNAI |
ISSN (painettu) | 0302-9743 |
ISSN (elektroninen) | 1611-3349 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
---|---|
Lyhennettä | ECML PKDD |
Maa/Alue | Italia |
Kaupunki | Turin |
Ajanjakso | 18/09/2023 → 22/09/2023 |
Sormenjälki
Sukella tutkimusaiheisiin 'Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.-
CLISHEAT/Marttinen: Green and digital healthcare
Marttinen, P. (Vastuullinen tutkija), Gao, Y. (Projektin jäsen), John, T. (Projektin jäsen) & Moen, H. (Projektin jäsen)
EU The Recovery and Resilience Facility (RRF)
01/01/2023 → 31/12/2025
Projekti: Academy of Finland: Other research funding
-
INTERVENE: International consortium for integrative genomics prediction
Kaski, S. (Vastuullinen tutkija)
01/01/2021 → 31/12/2025
Projekti: EU: Framework programmes funding
-
DATALIT: Data Literacy for Responsible Decision-Making
Marttinen, P. (Vastuullinen tutkija), Ji, S. (Projektin jäsen), Gröhn, T. (Projektin jäsen), Honkamaa, J. (Projektin jäsen), Kumar, Y. (Projektin jäsen), Pöllänen, A. (Projektin jäsen), Ojala, F. (Projektin jäsen), Raj, V. (Projektin jäsen) & Tiwari, P. (Projektin jäsen)
01/10/2020 → 30/09/2023
Projekti: Academy of Finland: Strategic research funding