Efficient and trustworthy methods for knowledge discovery

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

Data are building blocks to information and, subsequently, they are vital input to knowledge. Today, in the midst of the digital era, vast quantities of highly-complex data are being collected and processed at an unprecedented scale. This abundance of data has highlighted the importance of efficient and effective knowledge-discovery algorithms to identify patterns hidden in the data with the ultimate aim of uncovering valuable knowledge and shape our understanding of the world around us. To capitalize on the opportunities offered by massive amounts of data as well as modern computing power, for many years, research in knowledge discovery and related areas has introduced algorithms that are increasingly efficient and effective, but also more and more opaque and unpredictable. Recently, growing interest in the ethical dimensions of algorithms has drawn attention to the limitations of opaque algorithms and has emphasized a need for trustworthy algorithms particularly when such algorithms are used to support high-stakes decision making. In order to be trustworthy, algorithms should solve a clearly defined problem via a clear sequence of instructions, they should not be utterly unsuccessful in any particular case and they should be easy to understand and interpret for humans so that no harmful biases can be hidden. In this thesis, we pursue the goal of developing novel knowledge-discovery algorithmic methods that are not only highly efficient to face the challenges and opportunities posed by modern data, but also trustworthy. In particular, we propose efficient and trustworthy methods for a collection of popular knowledge-discovery tasks. First, we consider tasks of exact inference in Bayesian networks and hidden Markov models. Trustworthy approaches for such tasks exist. However, their applicability may be severely limited by time or memory requirements. Therefore, we propose novel methods to reduce the time or memory resources that are needed by existing approaches for the considered exact inference tasks. Beside exact inference tasks, we also consider two different knowledge-discovery tasks that arise naturally in modern data: multi-label classification and community search in temporal graphs. Regarding multi-label classification, we propose an efficient and accurate rule-based multi-label classifier that drastically improves upon the interpretability of existing solutions. For community search in temporal graphs, we formalise the task for the first time, and we propose a solution that guarantees high efficiency and interpretability. In designing knowledge-discovery methods, we often rely on existing database-management and probabilistic methods. Methods for database management are valuable to address the large dimension and high complexity of modern data, while probabilistic methods are essential to methodologically handle uncertainty in the data.
Translated title of the contributionEfficient and trustworthy methods for knowledge discovery
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Gionis, Aristides, Supervising Professor
  • Aslay, Cigdem, Thesis Advisor
Publisher
Print ISBNs978-952-64-1557-4
Electronic ISBNs978-952-64-1558-1
Publication statusPublished - 2023
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • knowledge discovery
  • trustworthy algorithms
  • scalable algorithms

Fingerprint

Dive into the research topics of 'Efficient and trustworthy methods for knowledge discovery'. Together they form a unique fingerprint.

Cite this