MGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion

Research output: Contribution to journalArticle

Standard

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{f01f08610df143148b9e3a067a752f3a,
title = "MGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion",
abstract = "Motivation: Proteins are commonly used by biochemical industry for numerous processes. Refining these proteins? properties via mutations causes stability effects as well. Accurate computational method to predict how mutations affect protein stability is necessary to facilitate efficient protein design. However, accuracy of predictive models is ultimately constrained by the limited availability of experimental data. Results: We have developed mGPfusion, a novel Gaussian process (GP) method for predicting protein?s stability changes upon single and multiple mutations. This method complements the limited experimental data with large amounts of molecular simulation data. We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data. Our protein-specific model requires experimental data only regarding the protein of interest and performs well even with few experimental measurements. The mGPfusion models proteins by contact maps and infers the stability effects caused by mutations with a mixture of graph kernels. Our results show that mGPfusion outperforms stateof- the-art methods in predicting protein stability on a dataset of 15 different proteins and that incorporating molecular simulation data improves the model learning and prediction accuracy.",
author = "Emmi Jokinen and Markus Heinonen and Harri L{\"a}hdesm{\"a}ki",
year = "2018",
month = "7",
day = "1",
doi = "10.1093/bioinformatics/bty238",
language = "English",
volume = "34",
pages = "i274--i283",
journal = "Bioinformatics",
issn = "1367-4803",
number = "13",

}

RIS - Download

TY - JOUR

T1 - MGPfusion

T2 - Predicting protein stability changes with Gaussian process kernel learning and data fusion

AU - Jokinen, Emmi

AU - Heinonen, Markus

AU - Lähdesmäki, Harri

PY - 2018/7/1

Y1 - 2018/7/1

N2 - Motivation: Proteins are commonly used by biochemical industry for numerous processes. Refining these proteins? properties via mutations causes stability effects as well. Accurate computational method to predict how mutations affect protein stability is necessary to facilitate efficient protein design. However, accuracy of predictive models is ultimately constrained by the limited availability of experimental data. Results: We have developed mGPfusion, a novel Gaussian process (GP) method for predicting protein?s stability changes upon single and multiple mutations. This method complements the limited experimental data with large amounts of molecular simulation data. We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data. Our protein-specific model requires experimental data only regarding the protein of interest and performs well even with few experimental measurements. The mGPfusion models proteins by contact maps and infers the stability effects caused by mutations with a mixture of graph kernels. Our results show that mGPfusion outperforms stateof- the-art methods in predicting protein stability on a dataset of 15 different proteins and that incorporating molecular simulation data improves the model learning and prediction accuracy.

AB - Motivation: Proteins are commonly used by biochemical industry for numerous processes. Refining these proteins? properties via mutations causes stability effects as well. Accurate computational method to predict how mutations affect protein stability is necessary to facilitate efficient protein design. However, accuracy of predictive models is ultimately constrained by the limited availability of experimental data. Results: We have developed mGPfusion, a novel Gaussian process (GP) method for predicting protein?s stability changes upon single and multiple mutations. This method complements the limited experimental data with large amounts of molecular simulation data. We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data. Our protein-specific model requires experimental data only regarding the protein of interest and performs well even with few experimental measurements. The mGPfusion models proteins by contact maps and infers the stability effects caused by mutations with a mixture of graph kernels. Our results show that mGPfusion outperforms stateof- the-art methods in predicting protein stability on a dataset of 15 different proteins and that incorporating molecular simulation data improves the model learning and prediction accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85050799574&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty238

DO - 10.1093/bioinformatics/bty238

M3 - Article

VL - 34

SP - i274-i283

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

ER -

ID: 27134506