Projects per year
Abstract
Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive intercluster edges and negative intracluster edges.
Consider an input graph G on n vertices such that the positive edges induce a λarboric graph. Our main result is a 3approximation (in expectation) algorithm to correlation clustering that runs in 𝒪(log λ ⋅ poly(log log n)) MPC rounds in the strongly sublinear memory regime. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the PIVOT algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with worst case (1+ε)approximation guarantees in the special case of forests, where λ = 1.
Consider an input graph G on n vertices such that the positive edges induce a λarboric graph. Our main result is a 3approximation (in expectation) algorithm to correlation clustering that runs in 𝒪(log λ ⋅ poly(log log n)) MPC rounds in the strongly sublinear memory regime. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the PIVOT algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with worst case (1+ε)approximation guarantees in the special case of forests, where λ = 1.
Original language  English 

Title of host publication  35th International Symposium on Distributed Computing, DISC 2021 
Editors  Seth Gilbert 
Publisher  Schloss Dagstuhl  LeibnizZentrum für Informatik 
Number of pages  18 
ISBN (Electronic)  9783959772105 
DOIs  
Publication status  Published  2021 
MoE publication type  A4 Conference publication 
Event  International Symposium on Distributed Computing  Virtual, Online, Freiburg, Germany Duration: 4 Oct 2021 → 8 Oct 2021 Conference number: 35 
Publication series
Name  Leibniz International Proceedings in Informatics, LIPIcs 

Publisher  Schloss Dagstuhl LeibnizZentrum fur Informatik GmbH, Dagstuhl Publishing 
Volume  209 
ISSN (Electronic)  18688969 
Conference
Conference  International Symposium on Distributed Computing 

Abbreviated title  DISC 
Country/Territory  Germany 
City  Freiburg 
Period  04/10/2021 → 08/10/2021 
Fingerprint
Dive into the research topics of 'Massively Parallel Correlation Clustering in Bounded Arboricity Graphs'. Together they form a unique fingerprint.Projects
 1 Active

: Massively Parallel Algorithms for LargeScale Graph Problems
Uitto, J., Cambus, M., Latypov, R. & Pai, S.
01/09/2020 → 31/08/2024
Project: Academy of Finland: Other research funding