Maximizing diversity over clustered data*

Guangyi Zhang, Aristides Gionis

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

53 Lataukset (Pure)

Abstrakti

Maximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets.

AlkuperäiskieliEnglanti
OtsikkoProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
ToimittajatCarlotta Demeniconi, Nitesh Chawla
KustantajaSociety for Industrial and Applied Mathematics Publications
Sivut649-657
Sivumäärä9
ISBN (elektroninen)9781611976236
DOI - pysyväislinkit
TilaJulkaistu - 2020
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaSIAM International Conference on Data Mining - Cincinnati, Yhdysvallat
Kesto: 7 toukokuuta 20209 toukokuuta 2020

Conference

ConferenceSIAM International Conference on Data Mining
LyhennettäSDM
MaaYhdysvallat
KaupunkiCincinnati
Ajanjakso07/05/202009/05/2020

Sormenjälki Sukella tutkimusaiheisiin 'Maximizing diversity over clustered data*'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

  • Projektit

    • 1 Päättynyt

    Active knowledge discovery in graphs

    Aslay, C., Gionis, A., Ordozgoiti Rubio, B. & Xiao, H.

    01/01/201831/12/2019

    Projekti: Academy of Finland: Other research funding

    Siteeraa tätä

    Zhang, G., & Gionis, A. (2020). Maximizing diversity over clustered data*. teoksessa C. Demeniconi, & N. Chawla (Toimittajat), Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 (Sivut 649-657). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611976236.73