Statistical analysis and modeling techniques are needed to acquire information from a plethora of high-dimensional (HD) data which are being generated due to digitalization, an increase of computing power, sensors and smart devices in our everyday life. Statistical learning from HD data is still a challenging problem despite the continuous improvement of computational resources and learning techniques. The performance of supervised learning approaches, such as regression or classification, often degrades when there exists an insufficient number of observations (samples) compared to data dimensionality (variables). Developing supervised learning methods for explanatory and predictive modeling of such HD data sets is crucial. Therefore, in this thesis, we propose new sparsity-driven methods for regression and classification which offer improved explanatory and predictive powers. In this thesis, new solvers are proposed for sparsity-driven linear regression problems, namely Lasso and elastic net (EN), which are specially designed to handle complex-valued data. These solvers are applied for explanatory modeling to estimate the direction-of-arrivals (DoAs) of impinging sources to a sensor array using compressed beamforming (CBF) technique. The developed methods are, however, completely general and can be applied in various HD linear regression problems dealing with complex- or real-valued data. Moreover, an approach called the sequential adaptive EN is developed to enhance the recovery of the exact support of the sparse signal vector. This is then used to find the DoAs of sources using the CBF framework. Furthermore, the regularization paths of the Lasso and EN computed by the developed algorithm and generalized information criterion are used in proposing a novel method for detecting the sparsity level of the signal, which corresponds to the number of sources in DoA estimation problem. This thesis also proposes a compressive classification framework for predicting the class of high-dimensional observation. The proposed compressive regularized discriminant analysis (CRDA)-based set of classifiers is applied for feature selection and classification of HD data, particularly gene expression data. CRDA-based approach outperforms current state-of-the-art methods that fail at least in one of the three facets, namely accuracy, learning speed and interpretability.
|Translated title of the contribution||Sparsity Driven Statistical Learning for High-Dimensional Regression and Classification|
|Publication status||Published - 2020|
|MoE publication type||G5 Doctoral dissertation (article)|
- feature selection
- high-dimensional statistics
- joint-sparse recovery
- statistical learning