Modern technical capabilities and systems have resulted in an abundance of data. A significant portion of all that data is of temporal nature. Hence, it becomes imperative to design effective and efficient algorithms, and solutions that enable searching and analysing large databases of temporal data. This thesis contains several contributions related to the broad scientific field of temporal-data analysis. First, we present a distance function for pairs of event-interval sequences, together with proofs of important properties, such as that the function is a metric, and a lower-bounding function. An embedding-based indexing method is proposed for searching through large databases of event-interval sequences, under this distance function. Second, we study the problem of subsequence search for event-interval sequences. This includes hardness results, an exact worst-case exponential-time algorithm, two upper bounds and a scheme for approximation algorithms. In addition, an equivalence is established between graphs and event-interval sequences. This equivalence allows to derive hardness results for several problems of event-interval sequences. Most importantly, it raises the question which techniques, results, and methods from each of the fields of graph mining and temporal data mining can be applied to the other that would advance the current state of the art. Third, for the problem of subsequence search, we propose an indexing method based on decomposing event-interval sequences into 2-interval patterns. The proposed indexing method is benchmarked against other approaches. In addition, we examine different variations of the problem and propose exact algorithms for solving them. Fourth, we describe a complete system that enables the clustering of a stream of graphs. The graphs are clustered into groups based on their distances, via approximating the graph edit distance. The proposed clustering algorithm achieves a good clustering with few graph comparisons. The effectiveness and usefulness of the systems is demonstrated by clustering function call-graphs of binary executable files for the purpose of malware detection. Finally, we solve the problem of summarising temporal networks. We assume that networks operate in certain modes and that the total sequence of interactions can be modelled as a series of transitions between these modes. We prove hardness results and provide heuristic procedures for finding approximate solutions. We demonstrate the quality of our methods via benchmarking and performing case-studies on datasets taken from sports and social networks.
|Translated title of the contribution||Advances in Analysing Temporal Data|
|Publication status||Published - 2017|
|MoE publication type||G5 Doctoral dissertation (article)|
- temporal data
- event-interval sequences
- temporal networks