Identification of small molecules remains a central question in analytical chemistry, in particular for natural product research, metabolomics, environmental research, and biomarker discovery. Mass spectrometry is the predominant technique for high-throughput analysis of small molecules. But it reveals only information about the mass of molecules and, by using tandem mass spectrometry, about the mass of molecular fragments. Automated interpretation of mass spectra is often limited to searching in spectral libraries, such that we can only dereplicate molecules for which we have already recorded reference mass spectra. In my thesis “Computational methods for small molecule identification” we developed SIRIUS, a tool for the structural elucidation of small molecules with tandem mass spectrometry. The method first computes a hypothetical fragmentation tree using combinatorial optimization. By using a Bayesian statistical model, we can learn parameters and hyperparameters of the underlying scoring directly from data. We demonstrate that the statistical model, which was fitted on a small dataset, generalizes well across many different datasets and mass spectrometry instruments. In a second step the fragmentation tree is used to predict a molecular fingerprint using kernel support vector machines. The predicted fingerprint can be searched in a structure database to identify the molecular structure. We demonstrate that our machine learning model outperforms all other methods for this task, including its predecessor FingerID. SIRIUS is available as commandline tool and as user interface. The molecular fingerprint prediction is implemented as web service and receives over one million requests per month.
- Machine learning
- Mass spectrometry