Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models

Tutkimustuotos: Lehtiartikkeli

Standard

Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models. / Brouard, Celine; Basse, Antoine; d'Alche-Buc, Florence; Rousu, Juho.

julkaisussa: METABOLITES, Vuosikerta 9, Nro 8, 160, 08.2019.

Tutkimustuotos: Lehtiartikkeli

Harvard

APA

Vancouver

Author

Brouard, Celine ; Basse, Antoine ; d'Alche-Buc, Florence ; Rousu, Juho. / Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models. Julkaisussa: METABOLITES. 2019 ; Vuosikerta 9, Nro 8.

Bibtex - Lataa

@article{02c669612c2f4828b3232ecaf945abfb,
title = "Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models",
abstract = "In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.",
keywords = "metabolite identification, machine learning, structured prediction, kernel methods, METABOLITE IDENTIFICATION, PREDICTION",
author = "Celine Brouard and Antoine Basse and Florence d'Alche-Buc and Juho Rousu",
year = "2019",
month = "8",
doi = "10.3390/metabo9080160",
language = "English",
volume = "9",
journal = "METABOLITES",
issn = "2218-1989",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "8",

}

RIS - Lataa

TY - JOUR

T1 - Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models

AU - Brouard, Celine

AU - Basse, Antoine

AU - d'Alche-Buc, Florence

AU - Rousu, Juho

PY - 2019/8

Y1 - 2019/8

N2 - In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.

AB - In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.

KW - metabolite identification

KW - machine learning

KW - structured prediction

KW - kernel methods

KW - METABOLITE IDENTIFICATION

KW - PREDICTION

U2 - 10.3390/metabo9080160

DO - 10.3390/metabo9080160

M3 - Article

VL - 9

JO - METABOLITES

JF - METABOLITES

SN - 2218-1989

IS - 8

M1 - 160

ER -

ID: 36793141