Projects per year
Abstract
Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming and parallel algorithms for correlation clustering, where each pair of elements is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements that get separated. Our main contribution is a semi-streaming algorithm that achieves a (3+ε)-approximation to the minimum number of disagreements using a single pass over the stream. In addition, the algorithm also works for dynamic streams. Our approach builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a 3-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable in several models of massive graph processing, such as semi-streaming and Massively Parallel Computing (MPC), where sparse graphs can typically be handled much more efficiently. Our work improves on the approximation ratio of the recent single-pass 5-approximation algorithm and on the number of passes of the recent O(1/ε)-pass (3 + ε)-approximation algorithm [Behnezhad, Charikar, Ma, Tan FOCS'22, SODA'23]. Our algorithm is also more robust and can be applied in dynamic streams. Furthermore, it is the first single pass (3 + ε)-approximation algorithm that uses polynomial post-processing time.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) |
Editors | David P. Woodruff |
Publisher | Society for Industrial and Applied Mathematics |
Pages | 2861-2880 |
Number of pages | 20 |
ISBN (Electronic) | 978-1-61197-791-2 |
DOIs | |
Publication status | Published - 2024 |
MoE publication type | A4 Conference publication |
Event | ACM-SIAM Symposium on Discrete Algorithms - Alexandria, United States Duration: 7 Jan 2024 → 10 Jan 2024 Conference number: 35 |
Conference
Conference | ACM-SIAM Symposium on Discrete Algorithms |
---|---|
Abbreviated title | SODA |
Country/Territory | United States |
City | Alexandria |
Period | 07/01/2024 → 10/01/2024 |
Fingerprint
Dive into the research topics of 'A (3 + ε)-Approximate Correlation Clustering Algorithm in Dynamic Streams'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Massively Parallel Algorithms for Large-Scale Graph Problems
Uitto, J. (Principal investigator)
01/09/2020 → 31/08/2024
Project: RCF Academy Project