Community detection in complex networks: the role of node metadata
Recently, it was recognized that the problems lying between the order and chaos require a new scientific language and models to be developed. Network science has emerged as a promising interdisciplinary field studying the properties of all kinds of systems that emerge from interactions of large number of elements or constituents. A particularly interesting feature of complex networks is the presence of communities, or groups of nodes that have more connections between them than to the rest of the network. Communities provide an insight into the structure of the whole system and the immediate environment of each node, like circles of friends, or functionally related genes, and they have also been shown to play a role in various processes on networks. For these reasons numerous community detection algorithms have been proposed that take the network structure as input and return the communities, the nodes belong to. As the field of community detection matured, more scrutiny was applied to old and new algorithms. The researchers were not satisfied any more with good results on simple, almost toy examples, more proofs were sought for the applicability of the algorithms in the real world. At the same time, larger and more complex network datasets were becoming available, in which the need to identify meso-scale structures was even higher. A straightforward way to test the algorithms is to compare the results with the known node community assignments, which are taken to correspond to metadata labels on the nodes. In the first part of this dissertation a large number of algorithms were tested on a large number of labeled networks from different domains. Weak correspondences between metadata and communities indicate that more care has to be taken when using metadata as community labels. The relationship between the node metadata and communities is perhaps more complex than it was earlier assumed, but this does not mean that it is absent. Second part of this dissertation presents a novel approach for incorporating the metadata into community detection without assuming their usefulness. This approach enables to discriminate between metadata that are aligned with community structure and those that are not. The third part of this dissertation proposes the use of the stochastic blockmodel for modeling the citation networks of journals. The model is able to capture rich structures present in the data, while being simple, intuitive and applicable to huge networks (millions of nodes and links). By splitting the data spanning more that a hundred years into separate time windows, it was possible to track the evolution of science in time, and using the model presented in the previous part of the dissertation, the usefulness of journal classification into subject categories as predictors of the citation flows was evaluated.
|Tila||Julkaistu - 2017|
|OKM-julkaisutyyppi||G5 Tohtorinväitöskirja (artikkeli)|