Projects per year
Abstract
We introduce a novel problem for diversityaware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimumcost clustering of the data into k clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements. More precisely, we are given a set of clients C, a set of facilities, a collection F= { F_{1}, ⋯, F_{t}} of facility groups, a budget k, and a set of lowerbound thresholds R= { r_{1}, ⋯, r_{t}}, one for each group in F. The diversityaware kmedian problem asks to find a set S of k facilities in such that  S∩ F_{i} ≥ r_{i}, that is, at least r_{i} centers in S are from group F_{i}, and the kmedian cost ∑ _{c}_{∈}_{C}min _{s}_{∈}_{S}d(c, s) is minimized. We show that in the general case where the facility groups may overlap, the diversityaware kmedian problem is NP hard, fixedparameter intractable with respect to parameter k, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and redblue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxationbased heuristic for the theoretically intractable case, which can provide highquality and efficient solutions for realworld datasets.
Original language  English 

Title of host publication  Machine Learning and Knowledge Discovery in Databases. Research Track  European Conference, ECML PKDD 2021, Proceedings 
Editors  Nuria Oliver, Fernando PérezCruz, Stefan Kramer, Jesse Read, Jose A. Lozano 
Publisher  Springer 
Pages  765780 
Number of pages  16 
ISBN (Print)  9783030865191 
DOIs  
Publication status  Published  2021 
MoE publication type  A4 Conference publication 
Event  European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases  Virtual, Online Duration: 13 Sept 2021 → 17 Sept 2021 
Publication series
Name  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 

Publisher  Springer 
Volume  12976 LNAI 
ISSN (Print)  03029743 
ISSN (Electronic)  16113349 
Conference
Conference  European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 

Abbreviated title  ECML PKDD 
City  Virtual, Online 
Period  13/09/2021 → 17/09/2021 
Keywords
 Algorithmic bias
 Algorithmic fairness
 Diversityaware clustering
 Fair clustering
Fingerprint
Dive into the research topics of 'DiversityAware kmedian: Clustering with Fair Center Representation'. Together they form a unique fingerprint.
SoBigDataPlusPlus: Integrated Infrastructure for Social Mining and Big Data Analytics
Lampinen, J., Roy, C. & Bhattacharya, K.
01/01/2020 → 31/12/2024
Project: EU: Framework programmes funding

MLDB: Model Management Systems: Machine learning meets Database Systems
Gionis, A., Aslay, C., Ciaperoni, M., Xiao, H., Matakos, A. & Muniyappa, S.
01/09/2019 → 31/08/2023
Project: Academy of Finland: Other research funding

Adaptive and intelligent data
Gionis, A., Ordozgoiti Rubio, B., Zhang, G. & Muniyappa, S.
01/01/2018 → 30/06/2022
Project: Academy of Finland: Other research funding