Algorithms for Data-Efficient Training of Deep Neural Networks

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

Deep Neural Networks ("deep learning") have become a ubiquitous choice of algorithms for Machine Learning applications. These systems often achieve human-level or even super-human level performances across a variety of tasks such as computer vision, natural language processing, speech recognition, reinforcement learning, generative modeling and healthcare. This success can be attributed to their ability to learn complex representations directly from the raw input data, completely eliminating the hand-crafted feature extraction from the pipeline. However, there still exists a caveat: due to the extremely large number of trainable parameters in Deep Neural Networks, their generalization ability depends heavily on the availability of a large amount of labeled data. In many machine learning applications, gathering a large amount of labeled data is not feasible due to privacy, cost, time or expertise constraints. Examples of such applications are abundant in healthcare; for example, predicting the effect of a medicine on a new patient in the scenario where the medicine has been administered to only a few patients earlier. This thesis addresses the problem of improving the generalization ability of Deep Neural Networks using a limited amount of labeled data. More specifically, this thesis explores a class of methods that directly incorporates the inductive bias about how the Deep Neural Networks should "behave" in-between the training samples (both in the input space as well as the hidden space) into the learning algorithms. Throughout several publications included in this thesis, the author has demonstrated that such kinds of methods can outperform conventional baseline methods and achieve state-of-the-art performance across supervised, unsupervised, semi-supervised, adversarial training and graph-based learning settings. In addition to these algorithms, the author proposes a mutual information based method for learning the representations for the "graph-level" tasks in an unsupervised and semi-supervised manner. Finally, the author proposes a method to improve the generalization of ResNets based on the iterative inference view.
Translated title of the contributionAlgorithms for Data-Efficient Training of Deep Neural Networks
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Kannala, Juho, Supervising Professor
  • Bengio, Yoshua, Thesis Advisor, External person
  • Raiko, Tapani, Thesis Advisor
  • Karhunen, Juha, Thesis Advisor
Publisher
Print ISBNs978-952-64-0159-1
Electronic ISBNs978-952-64-0160-7
Publication statusPublished - 2020
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • deep neural networks
  • machine learning

Fingerprint Dive into the research topics of 'Algorithms for Data-Efficient Training of Deep Neural Networks'. Together they form a unique fingerprint.

Cite this