Cluster ensemble selection with constraints

Research output: Contribution to journalArticleScientificpeer-review

Researchers

Research units

  • Xiamen University
  • Nanjing University of Posts and Telecommunications
  • Florida International University

Abstract

Clustering ensemble has emerged as an important tool for data analysis, by which a more robust and accurate consensus clustering can be generated. On forming the ensembles, empirical studies have suggested that better ensembles can be obtained by simultaneously considering the quality of the ensembles and the diversity among ensemble members. However, little research efforts have been paid to incorporate prior background knowledge. In this paper, we first provide a theoretical analysis on the effect of the diversity and quality of the ensemble members. We then propose a unified framework to solve constraint-based clustering ensemble selection problem, where some instance level must-link and cannot-link constraints are given as prior knowledge or background information. We formalize this problem as a combinatorial optimization problem in terms of the consistency under the constraints, the diversity among ensemble members, and the overall quality of ensembles. Our proposed framework brings together two distinct yet interrelated themes from clustering: ensemble clustering and semi-supervised clustering. We study different techniques for searching high-quality solutions. Experiments on benchmark datasets demonstrate the effectiveness of our framework.

Details

Original languageEnglish
Pages (from-to)59-70
Number of pages12
JournalNeurocomputing
Volume235
Publication statusPublished - 26 Apr 2017
MoE publication typeA1 Journal article-refereed

    Research areas

  • Cluster ensemble, Constraint, Ensemble selection, Semi-supervised

ID: 10957158