This thesis discusses Bayesian statistical inference in supervised learning problems where the data are scarce but the number of features large. The focus is on two important tasks. The first one is the prediction of some target variable of interest. The other task is feature selection, where the goal is to identify a small subset of features which are relevant for the prediction. A good predictive accuracy is often intrinsically valuable and a means to understanding the data. Feature selection can further help to make the model easier to interpret and reduce future costs if there is a price associated with predicting with many features.
Most traditional approaches try to solve both problems at once by formulating an estimation procedure that performs automatic or semiautomatic feature selection as a by-product of the predictive model fitting. This thesis argues that in many cases one can benefit from a decision theoretically justified two-stage approach. In this approach, one first constructs a model that predicts well but possibly uses many features. In the second stage, one then finds a minimal subset of features that can characterize the predictions of this model. The basic idea of this so called projective framework has been around for a long time but it has largely been overlooked in the statistics and machine learning community. This approach offers plenty of freedom for building an accurate prediction model as one does not need to care about feature selection at this point, and it turns out solving the feature selection problem often becomes substantially easier given an accurate prediction model that can be used as a reference.
The thesis focuses mostly on generalized linear models. To solve the problem of predictive model construction, the thesis introduces novel methods for encoding prior information about sparsity and regularization into the model. These methods can in some cases help to improve the prediction accuracy and robustify the posterior inference, but they also advance the current theoretical understanding of the fundamental characteristics of some commonly used prior distributions. The thesis explores also computationally efficient dimension reduction techniques that can be used as shortcuts for predictive model construction when the number of features is very large. Furthermore, the thesis develops the existing projective feature selection method further so as to make the computation fast and accurate for large number of features. Finally, the thesis takes the initial steps towards extending this framework to nonlinear and nonparametric Gaussian process models. The contributions of this thesis are solely methodological, but the benefits of the proposed methods are illustrated using example datasets from various fields, in particular from computational genetics.
|Publication status||Published - 2019|
|MoE publication type||G5 Doctoral dissertation (article)|
- Bayesian generalized linear models, feature selection, dimension reduction