This dissertation analyses two popular methods used in Bayesian data analysis, that involve splitting the data set into disjoint sets. The analysed approximative methods include expectation propagation (EP) and leave-one-out cross-validation (LOO-CV), which are used in the context of distributed inference and model evaluation/comparison respectively. The main contribution of the dissertation is in analysing the applicability and behaviour of the methods under different situations. The EP algorithm is a popular method for approximating a factorisable density. In the Bayesian context, for tractability, it has usually been applied pointwise. However, by including multiple observations in one approximated factor component, the method can be seen as a flexible framework for distributed inference. In addition, in hierarchical settings, it provides a convenient mean for dimension reduction by concentrating parameter inferences to separate units. LOO-CV is a popular method used in model evaluation, comparison, and weighting for estimating the out-of-sample predictive performance of a model using the given observations. In some situations, obtaining the estimate is a computationally heavy operation. The dissertation addresses this issue in the context of Gaussian latent variable models (GLVM) by reviewing various more efficient methods for approximating the LOO-CV estimate. Based on the results, a suggestion of approaches with different levels of accuracy and computational complexity are proposed. As the variability of the LOO-CV estimator can be high in some problems, it is important to take into account the related uncertainty when applying the LOO-CV method in practice. The current popular ways of estimating the uncertainty often leads to considerably underestimating the variability. The dissertation studies the behaviour of the uncertainty in a model comparison setting both theoretically and experimentally and identifies problematic cases, in which the estimated uncertainty is badly calibrated. The problematic cases include small data size, models making similar predictions, and model misspecification. In addition, the dissertation proposes an improved estimator for the variance of the LOO-CV estimator in the case of a Bayesian normal model. The proposed estimator serves as an example of the possibility of obtaining improved model-specific uncertainty estimates. This approach has not been discussed in the literature before.
|Translated title of the contribution||Jaottelevat menetelmät bayesilaisessa data-analytiikassa|
|Publication status||Published - 2021|
|MoE publication type||G5 Doctoral dissertation (article)|
- Bayesian data analysis
- model comparison
- approximative distributed inference
- Gaussian processes