Computational methods for small molecule identification

Kai Dührkop*

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Identification of small molecules remains a central question in analytical chemistry, in particular for natural product research, metabolomics, environmental research, and biomarker discovery. Mass spectrometry is the predominant technique for high-throughput analysis of small molecules. But it reveals only information about the mass of molecules and, by using tandem mass spectrometry, about the mass of molecular fragments. Automated interpretation of mass spectra is often limited to searching in spectral libraries, such that we can only dereplicate molecules for which we have already recorded reference mass spectra. In my thesis “Computational methods for small molecule identification” we developed SIRIUS, a tool for the structural elucidation of small molecules with tandem mass spectrometry. The method first computes a hypothetical fragmentation tree using combinatorial optimization. By using a Bayesian statistical model, we can learn parameters and hyperparameters of the underlying scoring directly from data. We demonstrate that the statistical model, which was fitted on a small dataset, generalizes well across many different datasets and mass spectrometry instruments. In a second step the fragmentation tree is used to predict a molecular fingerprint using kernel support vector machines. The predicted fingerprint can be searched in a structure database to identify the molecular structure. We demonstrate that our machine learning model outperforms all other methods for this task, including its predecessor FingerID. SIRIUS is available as commandline tool and as user interface. The molecular fingerprint prediction is implemented as web service and receives over one million requests per month.

Original languageEnglish
JournalIT - Information Technology
Volume61
Issue number5-6
DOIs
Publication statusPublished - 24 Oct 2019
MoE publication typeA1 Journal article-refereed

Keywords

  • Bioinformatics
  • Machine learning
  • Mass spectrometry
  • Metabolomics

Fingerprint

Dive into the research topics of 'Computational methods for small molecule identification'. Together they form a unique fingerprint.

Cite this