# Gaussian process modelling in approximate bayesian computation to estimate horizontal gene transfer in Bacteria

Tutkimustuotos: Lehtiartikkeli

### Standard

**Gaussian process modelling in approximate bayesian computation to estimate horizontal gene transfer in Bacteria.** / Järvenpää, Marko; Gutmann, Michael U.; Vehtari, A. K.I.; Marttinen, Pekka.

Tutkimustuotos: Lehtiartikkeli

### Harvard

*Annals of Applied Statistics*, Vuosikerta. 12, Nro 4, Sivut 2228-2251. https://doi.org/10.1214/18-AOAS1150

### APA

*Annals of Applied Statistics*,

*12*(4), 2228-2251. https://doi.org/10.1214/18-AOAS1150

### Vancouver

### Author

### Bibtex - Lataa

}

### RIS - Lataa

TY - JOUR

T1 - Gaussian process modelling in approximate bayesian computation to estimate horizontal gene transfer in Bacteria

AU - Järvenpää, Marko

AU - Gutmann, Michael U.

AU - Vehtari, A. K.I.

AU - Marttinen, Pekka

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins.

AB - Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins.

KW - Approximate bayesian computation

KW - Gaussian process

KW - Input-dependent noise

KW - Intractable likelihood

KW - Model selection

UR - http://www.scopus.com/inward/record.url?scp=85057174949&partnerID=8YFLogxK

U2 - 10.1214/18-AOAS1150

DO - 10.1214/18-AOAS1150

M3 - Article

VL - 12

SP - 2228

EP - 2251

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 4

ER -

ID: 30293047