Abstract
Modern technical capabilities and systems have resulted in an abundance of data. A significant portion of all that data is of temporal nature. Hence, it becomes imperative to design effective and efficient algorithms, and solutions that enable searching and analysing large databases of temporal data. This thesis contains several contributions related to the broad scientific field of temporaldata analysis. First, we present a distance function for pairs of eventinterval sequences, together with proofs of important properties, such as that the function is a metric, and a lowerbounding function. An embeddingbased indexing method is proposed for searching through large databases of eventinterval sequences, under this distance function. Second, we study the problem of subsequence search for eventinterval sequences. This includes hardness results, an exact worstcase exponentialtime algorithm, two upper bounds and a scheme for approximation algorithms. In addition, an equivalence is established between graphs and eventinterval sequences. This equivalence allows to derive hardness results for several problems of eventinterval sequences. Most importantly, it raises the question which techniques, results, and methods from each of the fields of graph mining and temporal data mining can be applied to the other that would advance the current state of the art. Third, for the problem of subsequence search, we propose an indexing method based on decomposing eventinterval sequences into 2interval patterns. The proposed indexing method is benchmarked against other approaches. In addition, we examine different variations of the problem and propose exact algorithms for solving them. Fourth, we describe a complete system that enables the clustering of a stream of graphs. The graphs are clustered into groups based on their distances, via approximating the graph edit distance. The proposed clustering algorithm achieves a good clustering with few graph comparisons. The effectiveness and usefulness of the systems is demonstrated by clustering function callgraphs of binary executable files for the purpose of malware detection. Finally, we solve the problem of summarising temporal networks. We assume that networks operate in certain modes and that the total sequence of interactions can be modelled as a series of transitions between these modes. We prove hardness results and provide heuristic procedures for finding approximate solutions. We demonstrate the quality of our methods via benchmarking and performing casestudies on datasets taken from sports and social networks.
Translated title of the contribution  Advances in Analysing Temporal Data 

Original language  English 
Qualification  Doctor's degree 
Awarding Institution 

Supervisors/Advisors 

Publisher  
Print ISBNs  9789526073712 
Electronic ISBNs  9789526073705 
Publication status  Published  2017 
MoE publication type  G5 Doctoral dissertation (article) 
Keywords
 temporal data
 eventinterval sequences
 temporal networks
 streams
 graphs
Fingerprint Dive into the research topics of 'Advances in Analysing Temporal Data'. Together they form a unique fingerprint.
Cite this
Kostakis, O. (2017). Advances in Analysing Temporal Data. Aalto University. http://urn.fi/URN:ISBN:9789526073705