Abstrakti
The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: de novo molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at https://github.com/pluskal-lab/MassSpecGym.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | Advances in Neural Information Processing Systems 37 (NeurIPS 2024) |
Kustantaja | Curran Associates Inc. |
Sivut | 1-18 |
Sivumäärä | 18 |
ISBN (painettu) | 9798331314385 |
Tila | Julkaistu - 2025 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | Conference on Neural Information Processing Systems - Vancouver, Canada, Vancouver , Kanada Kesto: 10 jouluk. 2024 → 15 jouluk. 2024 Konferenssinumero: 38 https://neurips.cc/Conferences/2024 |
Julkaisusarja
Nimi | Advances in Neural Information Processing Systems |
---|---|
Kustantaja | Neural Information Processing Systems Foundation |
ISSN (painettu) | 1049-5258 |
Conference
Conference | Conference on Neural Information Processing Systems |
---|---|
Lyhennettä | NeurIPS |
Maa/Alue | Kanada |
Kaupunki | Vancouver |
Ajanjakso | 10/12/2024 → 15/12/2024 |
www-osoite |