ADAPTIVE: LeArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

Tutkimustuotos: Lehtiartikkelivertaisarvioitu

Standard

ADAPTIVE : LeArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. / Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, Hiroshi.

julkaisussa: Bioinformatics, Vuosikerta 35, Nro 14, btz319, 15.07.2019, s. i164-i172.

Tutkimustuotos: Lehtiartikkelivertaisarvioitu

Harvard

APA

Vancouver

Author

Bibtex - Lataa

@article{b03bde10a7e34eb5a06fac265c5c3631,
title = "ADAPTIVE: LeArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra",
abstract = "Motivation: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency.",
author = "Nguyen, {Dai Hai} and Nguyen, {Canh Hao} and Hiroshi Mamitsuka",
year = "2019",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/btz319",
language = "English",
volume = "35",
pages = "i164--i172",
journal = "Bioinformatics",
issn = "1367-4803",
number = "14",

}

RIS - Lataa

TY - JOUR

T1 - ADAPTIVE

T2 - LeArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

AU - Nguyen, Dai Hai

AU - Nguyen, Canh Hao

AU - Mamitsuka, Hiroshi

PY - 2019/7/15

Y1 - 2019/7/15

N2 - Motivation: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency.

AB - Motivation: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency.

UR - http://www.scopus.com/inward/record.url?scp=85068907000&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz319

DO - 10.1093/bioinformatics/btz319

M3 - Article

VL - 35

SP - i164-i172

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

M1 - btz319

ER -

ID: 35580648