Projekteja vuodessa
Abstrakti
Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges.
Consider an input graph G on n vertices such that the positive edges induce a λ-arboric graph. Our main result is a 3-approximation (in expectation) algorithm to correlation clustering that runs in 𝒪(log λ ⋅ poly(log log n)) MPC rounds in the strongly sublinear memory regime. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the PIVOT algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with worst case (1+ε)-approximation guarantees in the special case of forests, where λ = 1.
Consider an input graph G on n vertices such that the positive edges induce a λ-arboric graph. Our main result is a 3-approximation (in expectation) algorithm to correlation clustering that runs in 𝒪(log λ ⋅ poly(log log n)) MPC rounds in the strongly sublinear memory regime. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the PIVOT algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with worst case (1+ε)-approximation guarantees in the special case of forests, where λ = 1.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | 35th International Symposium on Distributed Computing, DISC 2021 |
Toimittajat | Seth Gilbert |
Kustantaja | Schloss Dagstuhl - Leibniz-Zentrum für Informatik |
Sivumäärä | 18 |
ISBN (elektroninen) | 978-3-95977-210-5 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2021 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | International Symposium on Distributed Computing - Virtual, Online, Freiburg, Saksa Kesto: 4 lokak. 2021 → 8 lokak. 2021 Konferenssinumero: 35 |
Julkaisusarja
Nimi | Leibniz International Proceedings in Informatics, LIPIcs |
---|---|
Kustantaja | Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
Vuosikerta | 209 |
ISSN (elektroninen) | 1868-8969 |
Conference
Conference | International Symposium on Distributed Computing |
---|---|
Lyhennettä | DISC |
Maa/Alue | Saksa |
Kaupunki | Freiburg |
Ajanjakso | 04/10/2021 → 08/10/2021 |
Sormenjälki
Sukella tutkimusaiheisiin 'Massively Parallel Correlation Clustering in Bounded Arboricity Graphs'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Projektit
- 1 Päättynyt
-
Massively Parallel Algorithms for Large-Scale Graph Problems
Uitto, J. (Vastuullinen tutkija), Cambus, M. (Projektin jäsen), Latypov, R. (Projektin jäsen), Pai, S. (Projektin jäsen) & Zhu, X. (Projektin jäsen)
01/09/2020 → 31/08/2024
Projekti: Academy of Finland: Other research funding