Maximizing diversity over clustered data*

Guangyi Zhang, Aristides Gionis

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

5 Citations (Scopus)
120 Downloads (Pure)

Abstract

Maximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets.

Original languageEnglish
Title of host publicationProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
EditorsCarlotta Demeniconi, Nitesh Chawla
PublisherSociety for Industrial and Applied Mathematics
Pages649-657
Number of pages9
ISBN (Electronic)9781611976236
DOIs
Publication statusPublished - 2020
MoE publication typeA4 Conference publication
EventSIAM International Conference on Data Mining - Cincinnati, United States
Duration: 7 May 20209 May 2020

Conference

ConferenceSIAM International Conference on Data Mining
Abbreviated titleSDM
Country/TerritoryUnited States
CityCincinnati
Period07/05/202009/05/2020

Fingerprint

Dive into the research topics of 'Maximizing diversity over clustered data*'. Together they form a unique fingerprint.
  • Active knowledge discovery in graphs

    Gionis, A. (Principal investigator), Aslay, C. (Project Member), Zhang, G. (Project Member), Ordozgoiti Rubio, B. (Project Member) & Xiao, H. (Project Member)

    01/01/201831/12/2019

    Project: Academy of Finland: Other research funding

Cite this