In nature, we can find multivariate systems that contain non-monotone dependencies. Examples of these systems arise in neuroscience and economics, where several processes, that is combinations of variables, co-occur together but not necessarily linearly or monotonically. Kernel canonical correlation measures, that is the kernel canonical correlation and the Hilbert-Schmidt independence criterion, can be applied to capture the non-monotone dependencies occurring in these types of systems. However, in general, these measures convey whether a dependence exists in the system but it may not be straightforward to determine which of the processes, or variables, in the system are dependent. This thesis addresses the problem of interpreting kernel canonical correlation measures. First, we review the literature to identify the proposed strategies for interpreting the kernel canonical correlation. Second, we further extend the already presented technique by applying hierarchical clustering. Third, we propose a novel alternating projected gradient approach, gradKCCA, to compute both the kernel canonical correlation and the dependencies. Fourth, we present a novel alternating stochastic projected gradient algorithm, SCCA-HSIC, that can be combined with the Nyström approximation, to optimise the Hilbert-Schmidt independence criterion with respect to the dependencies. The first article identifies a correlation-based strategy for interpreting the kernel canonical correlation. The second article presents a clustergram visualisation, that summarises the correlations obtained by the first technique, on a heatmap. The clustergram visualisation is demonstrated on a real-world dataset obtained from deep bedrock groundwaters. In the third article, gradKCCA is shown to be fast, scalable, and accurate in identifying the dependencies in relation to state-of-the-art methods, both in simulation studies and in real-world datasets. The contribution of the fourth article, SCCA-HSIC, is shown to have a superior accuracy and scalability than the state-of-the-art methods, when evaluated on simulated and real-world datasets. The proposed methods and algorithms provide tools for better interpretation of the underlying, possibly non-monotone, dependent processes. These tools can easily be deployed by practitioners seeking to understand the functioning of a multivariate system. Finally, the presented optimisation strategies can be considered for alternative kernel-based dependence measures.
|Translated title of the contribution
|Menetelmiä ydinfunktioilla laajennettujen riippuvuusmittojen tulkintaan
|Published - 2020
|MoE publication type
|G5 Doctoral dissertation (article)
- canonical correlation
- kernel methods