Machine Learning for Networked Data

Research output: ThesisDoctoral ThesisCollection of Articles


Research units


The data arising in many important applications can be represented as networks. This network representation can be used to encode high-dimensional statistical relations in probabilistic graphical models (PGM). Network models allow extending (deterministic) methods of discrete-time signal processing to networked data. This dissertation studies two fundamental problems arising within the processing of networked data. The first problem is semi-supervised learning where given the network structure and some labeled data points, one aims to learn a predictor for the labels of every data point. A second core problem is the learning of a network structure in a fully data-driven fashion. We approach this structure learning problem using a probabilistic model for the data. This results in a graphical model selection problem (GMS). Using the underlying network structure of data, it is possible to learn an accurate predictor from few labeled data points. This dissertation provides conditions on the available labels concerning the network structure such that accurate learning is possible by convex optimization methods. We apply the network Lasso which is an instance of regularized risk minimization using the total variation as regularizer. The conditions are derived by characterizing the solutions of network Lasso. GMS methods learn a network structure based on the statistical relations between data points which are modeled as random variables. A key challenge in the application of GMS methods is a precise understanding of the required number of data points for accurate GMS. This dissertation characterizes the required sample size of zero-mean Gaussian random processes.


Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Print ISBNs978-952-60-8917-1
Electronic ISBNs978-952-60-8918-8
Publication statusPublished - 2020
MoE publication typeG5 Doctoral dissertation (article)

    Research areas

  • networked data, graphical model selection, semi-supervised learning

ID: 41318313