Data Used In "Fast Metabolite Identification With Input Output Kernel Regression"

  • Celine Brouard (Contributor)
  • Huibin Shen (Contributor)
  • Kai Dührkop (Contributor)
  • Florence d'Alché-Buc (Telecom Paris Tech, ComUE Paris-Saclay) (Contributor)
  • Sebastian Böcker (Contributor)
  • Juho Rousu (Contributor)



<p>This repository contains the data used in [1] to evaluate the performance for metabolite identification from tandem mass spectra. These data have been extracted and processed in [2]. We used a subset of 4138 MS/MS spectra extracted from the GNPS public spectral library ( for training and evaluation. For searching, we used molecular structures from PubChem as candidate sets.</p>

<p>Please mention and cite GNPS when using these data.</p>

<p>The implementation of the method proposed in [1] is available on:</p>

<p><strong>Files description:</strong></p>

<li><em>spectra.txt</em>: informations about the MS/MS spectra (GNPS identifier, compound name and INCHI identifier)</li>
<li><em>data_GNPS.mat</em>: contains the molecular fingerprints, molecular formula and InCHI corresponding to the MS/MS spectra</li>
<li><em>cv_ind.txt</em>: indices of the cross-validation folds</li>
<li><em>ind_eval.txt</em>: indices of the examples used for evaluation</li>
<li><em>candidates</em>: fingerprints and INCHI for the different candidate sets</li>
<li><em>input_kernels</em>: contains 24 input kernel matrices</li>


<p>[1] Brouard, C., Shen, H., Dührkop, K., d'Alché-Buc, F., Böcker, S. and Rousu, J.: Fast metabolite identification with Input Output Kernel Regression. In the proceedings of ISMB 2016, Bioinformatics 32(12): i28-i36, 2016. DOI:</p>

<p>[2] Dührkop, K., Shen, H., Meusel, M., Rousu, J. and Böcker, S.: Searching molecular structure databases with tandem mass spectra using CSI:FingerID. PNAS, 112(41), 12580-12585, 2015. doi:10.1073/pnas.1509788112</p>

