Sparse Bayesian Linear Models: Computational Advances and Applications in Epidemiology

Tomi Peltola

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

Recent advances in measurement technologies have transformed the landscape of studies in the genetic and metabolic determinants of diseases and other complex traits. DNA and blood samples can be cost- and time-efficiently interrogated for millions of genetic markers and hundreds of circulating metabolites. While the scale and unbiased nature of the characterization of the individual samples creates opportunities for new discoveries, they also pose a challenge for the statistical analysis of the data. One approach for tackling the issues, and a focus of much recent research in statistical methodology, is searching for linear relationships with a sparsity assumption, that is, the presence of only a limited number of practically relevant relationships among the vast number of possibilities. This thesis studies aspects of the statistical modelling and computation with the linearity and sparsity assumptions in the framework of Bayesian data analysis. First, a hierarchical extension of the spike and slab prior distribution for sparse linear regression modelling, to allow additive and dominant effects in genome-wide association analysis, is presented. The model is applied to search for genetic markers related to blood cholesterol levels. A tailored, finitely adaptive Markov chain Monte Carlo algorithm is studied for the computation. Second, an approach for constructing deterministic Gaussian approximations for Bayesian linear latent variable models using the expectation propagation method is described. The main advance is an efficient numerical solution to the moment integrals for bilinear probability factors. Third, a model for the prediction of the risk of adverse cardiovascular events in diabetic individuals using candidate biomarkers is presented. The model is extended hierarchically to include data from non-diabetic individuals. Shrinkage priors and projective covariate selection are applied to identify biomarkers with predictive value. The results of the studies demonstrate benefits from the hierarchical Bayesian modelling. Despite the advances here and generally in the literature, the computation in sparse models and large datasets remains challenging. On the other hand, given the fast pace in the development of deterministic approximation methods, assessing their role in predictive covariate selection would seem timely.
Translated title of the contributionHarvuutta suosivat bayesilaiset lineaarimallit: laskennallisia menetelmiä ja sovelluksia epidemiologiassa
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Lampinen, Jouko, Supervising Professor
  • Vehtari, Aki, Thesis Advisor
  • Marttinen, Pekka, Thesis Advisor
Publisher
Print ISBNs978-952-60-6011-8
Electronic ISBNs978-952-60-6012-5
Publication statusPublished - 2014
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • Bayesian linear modelling
  • sparsity
  • Markov chain Monte Carlo
  • approximate inference

Fingerprint

Dive into the research topics of 'Sparse Bayesian Linear Models: Computational Advances and Applications in Epidemiology'. Together they form a unique fingerprint.

Cite this