Multilabel Classification through Structured Output Learning - Methods and Applications

Julkaisun otsikon käännös: Multilabel Classification through Structured Output Learning - Methods and Applications

Hongyu Su

Tutkimustuotos: Doctoral ThesisCollection of Articles

Abstrakti

Multilabel classification is an important topic in machine learning that arises naturally from many real world applications. For example, in document classification, a research article can be categorized as “science”, “drug discovery” and “genomics” at the same time. The goal of multilabel classification is to reliably predict multiple outputs for a given input. As multiple interdependent labels can be “on” and “off” simultaneously, the central problem in multilabel classification is how to best exploit the correlation between labels to make accurate predictions. Compared to the previous flat multilabel classification approaches which treat multiple labels as a flat vector, structured output learning relies on an output graph connecting multiple labels to model the correlation between labels in a comprehensive manner. The main question studied in this thesis is how to tackle multilabel classification through structured output learning. This thesis starts with an extensive review on the topic of classification learning including both single-label and multilabel classification. The first problem we address is how to solve the multilabel classification problem when the output graph is observed apriori. We discuss several well-established structured output learning algorithms and study the network response prediction problem within the context of social network analysis. As the current structured output learning algorithms rely on the output graph to exploit the dependency between labels, the second problem we address is how to use structured output learning when the output graph is not known. Specifically, we examine the potential of learning on a set of random output graphs when the “real” one is hidden. This problem is relevant as in most multilabel classification problems there does not exist any output graph that reveals the dependency between labels. The third problem we address is how to analyze the proposed learning algorithms in a theoretical manner. Specifically, we want to explain the intuition behind the proposed models and to study the generalization error. The main contributions of this thesis are several new learning algorithms that widen the applicability of structured output learning. For the problem with an observed output graph, the proposed algorithm “SPIN” is able to predict an optimal directed acyclic graph from an observed underlying network that best responses to an input. For general multilabel classification problems without any known output graph, we proposed several learning algorithms that combine a set of structured output learners built on random output graphs. In addition, we develop a joint learning and inference framework which is based on max-margin learning over a random sample of spanning trees. The theoretic analysis also guarantees the generalization error of the proposed methods.
Julkaisun otsikon käännösMultilabel Classification through Structured Output Learning - Methods and Applications
AlkuperäiskieliEnglanti
PätevyysTohtorintutkinto
Myöntävä instituutio
  • Aalto-yliopisto
Valvoja/neuvonantaja
  • Rousu, Juho, Vastuuprofessori
Kustantaja
Painoksen ISBN978-952-60-6105-4
Sähköinen ISBN978-952-60-6106-1
TilaJulkaistu - 2015
OKM-julkaisutyyppiG5 Artikkeliväitöskirja

Sormenjälki

Sukella tutkimusaiheisiin 'Multilabel Classification through Structured Output Learning - Methods and Applications'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä