Exploring correlated data: confidence bands and projections of shared variation

Jussi Korpela

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

The steady increase in automatic data collection and analysis creates new possibilities for data-driven decision making. Consequently there is a need for the development of new explorative data analysis methods. This thesis deals with two such methods: multivariate confidence intervals and finding shared variation between datasets. First, we present a method to visualize the variation of a set of vector-valued data items. The visualization is a two dimensional confidence band, whose interpretation is similar to that of a one dimensional confidence interval. The goal is to have a band that covers a predefined fraction of the probability mass of the data vector distribution, such that the band can be used to assess likely values for a typical vector. We introduce new methods to compute the bands as well as describe in more detail the technical implementations of existing methods. In addition, we present a correction procedure that adjusts the coverage properties of the band when computed from a finite sample. The second part of the work deals with finding shared variation between datasets of a data collection. The analysis is applied to data collections that describe a certain process from multiple views, and hence the shared variation becomes a measure of the underlying process. The method can be used to find the periods during which the datasets share variation with each other. To solve the problem, we propose a filtering approach based on ordinary regression functions. The algorithm filters away all variation that is not shared by all of the datasets. Advantages of the method include easy implementation and adaptability – by changing the regression function one can easily change the definition of shared variation to match the problem at hand. Confidence bands have many applications in expressing the variability of time series and other vector valued data. A prime example are time series model forecasts whose modeling uncertainty is often visualized using a confidence band. Analysis of shared variation, on the other hand, is often needed in conjunction with biosignal analysis where one might be, e.g., interested in finding shared and unshared changes in signal level between test subjects.
Translated title of the contributionKorreloitunutta dataa tutkimassa: luottamusnauhat ja jaetun vaihtelun kuvaaminen
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Puolamäki, Kai, Supervising Professor
  • Gionis, Aristides, Thesis Advisor
Publisher
Print ISBNs978-952-60-7854-0
Electronic ISBNs978-952-60-7855-7
Publication statusPublished - 2018
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • time series
  • confidence band
  • simultaneous confidence interval
  • visualization
  • shared variation
  • regression

Fingerprint Dive into the research topics of 'Exploring correlated data: confidence bands and projections of shared variation'. Together they form a unique fingerprint.

Cite this