Partitive Techniques in Bayesian Data Analysis

Tuomas Sivula

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

This dissertation analyses two popular methods used in Bayesian data analysis, that involve splitting the data set into disjoint sets. The analysed approximative methods include expectation propagation (EP) and leave-one-out cross-validation (LOO-CV), which are used in the context of distributed inference and model evaluation/comparison respectively. The main contribution of the dissertation is in analysing the applicability and behaviour of the methods under different situations. The EP algorithm is a popular method for approximating a factorisable density. In the Bayesian context, for tractability, it has usually been applied pointwise. However, by including multiple observations in one approximated factor component, the method can be seen as a flexible framework for distributed inference. In addition, in hierarchical settings, it provides a convenient mean for dimension reduction by concentrating parameter inferences to separate units. LOO-CV is a popular method used in model evaluation, comparison, and weighting for estimating the out-of-sample predictive performance of a model using the given observations. In some situations, obtaining the estimate is a computationally heavy operation. The dissertation addresses this issue in the context of Gaussian latent variable models (GLVM) by reviewing various more efficient methods for approximating the LOO-CV estimate. Based on the results, a suggestion of approaches with different levels of accuracy and computational complexity are proposed. As the variability of the LOO-CV estimator can be high in some problems, it is important to take into account the related uncertainty when applying the LOO-CV method in practice. The current popular ways of estimating the uncertainty often leads to considerably underestimating the variability. The dissertation studies the behaviour of the uncertainty in a model comparison setting both theoretically and experimentally and identifies problematic cases, in which the estimated uncertainty is badly calibrated. The problematic cases include small data size, models making similar predictions, and model misspecification. In addition, the dissertation proposes an improved estimator for the variance of the LOO-CV estimator in the case of a Bayesian normal model. The proposed estimator serves as an example of the possibility of obtaining improved model-specific uncertainty estimates. This approach has not been discussed in the literature before.
Translated title of the contributionJaottelevat menetelmät bayesilaisessa data-analytiikassa
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Vehtari, Aki, Supervising Professor
  • Vehtari, Aki, Thesis Advisor
Publisher
Print ISBNs978-952-64-0268-0
Electronic ISBNs978-952-64-0269-7
Publication statusPublished - 2021
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • Bayesian data analysis
  • model comparison
  • approximative distributed inference
  • Gaussian processes

Fingerprint

Dive into the research topics of 'Partitive Techniques in Bayesian Data Analysis'. Together they form a unique fingerprint.

Cite this