TY - CHAP
T1 - MeSHLabeler and DeepMeSH
T2 - Recent progress in large-scale MeSH indexing
AU - Peng, Shengwen
AU - Mamitsuka, Hiroshi
AU - Zhu, Shanfeng
PY - 2018/1/1
Y1 - 2018/1/1
N2 - The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (seeNote 1) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh.
AB - The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (seeNote 1) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh.
KW - Machine learning
KW - Medical subject headings
KW - MEDLINE
KW - MeSH indexing
KW - Multi-label classification
KW - Text categorization
UR - http://www.scopus.com/inward/record.url?scp=85051215317&partnerID=8YFLogxK
U2 - 10.1007/978-1-4939-8561-6_15
DO - 10.1007/978-1-4939-8561-6_15
M3 - Chapter
AN - SCOPUS:85051215317
SN - 978-1-4939-8560-9
T3 - Methods in Molecular Biology
SP - 203
EP - 209
BT - Methods in Molecular Biology
PB - Springer
ER -