In the recent years, machine learning methods have become increasingly popular for modelling many different phenomena: financial markets, spatio-temporal data sets, pattern recognition, speech and image processing, recommender systems and many others. This huge interest in machine learning comes from the great success of their application and the increasingly easier acquisition, storage and access of data. In this thesis, two general problems in machine learning are discussed and several solutions are offered. The first problem is variable selection, an approach to automatically select the most relevant features in the data. Two key phases of variable selection are the search criterion and the search algorithm. The thesis focuses on the Delta test as a search criterion, while several solutions are offered for the search algorithm, such as the Genetic Algorithm and Tabu Search. Furthermore, the selection procedure is extended for more general cases of scaling and projection, as well as their combination. Finally, some of the above proposed solutions have been developed for parallel architectures which enable the whole variable selection procedure to be used for data sets with a high number of features. The second problem tackled in the thesis is time series prediction that arises in many fields of science and industry. In simple words: time series prediction involves the estimation of future values for a series of measurements of a/the phenomenon of interest. The number of these estimations can be small, leading to short-term prediction, or several hundreds which constitute long-term prediction. Two models have been developed for this particular task. One is based on a recently popular neural network type called Extreme Learning Machine, while the other is a juxtaposition of Generative Topographic Mapping and Relevance Learning modified for regression tasks. Finally, the above problems are tackled together for real-world time series coming from a biological domain. The difficulty of making any kind of inference in biological time series is due to really small amount of available samples, irregular sampling frequency and spatial coverage of areas of interest. Nevertheless, more stable model parameter estimation is possible with the combined use of global climate indicators and regional measurements in the form of a multifactor approach.
|Translated title of the contribution||Learning Methods for Variable Selection and Time Series Prediction|
|Publication status||Published - 2014|
|MoE publication type||G5 Doctoral dissertation (article)|
- variable selection/scaling/projection
- time series prediction
- environmental modelling
- model structure selection