A (3 + ε)-Approximate Correlation Clustering Algorithm in Dynamic Streams

Mélanie Cambus*, Fabian Kuhn, Etna Lindy*, Shreyas Pai*, Jara Uitto*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

4 Citations (Scopus)

Abstract

Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming and parallel algorithms for correlation clustering, where each pair of elements is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements that get separated. Our main contribution is a semi-streaming algorithm that achieves a (3+ε)-approximation to the minimum number of disagreements using a single pass over the stream. In addition, the algorithm also works for dynamic streams. Our approach builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a 3-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable in several models of massive graph processing, such as semi-streaming and Massively Parallel Computing (MPC), where sparse graphs can typically be handled much more efficiently. Our work improves on the approximation ratio of the recent single-pass 5-approximation algorithm and on the number of passes of the recent O(1/ε)-pass (3 + ε)-approximation algorithm [Behnezhad, Charikar, Ma, Tan FOCS'22, SODA'23]. Our algorithm is also more robust and can be applied in dynamic streams. Furthermore, it is the first single pass (3 + ε)-approximation algorithm that uses polynomial post-processing time.

Original languageEnglish
Title of host publicationProceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)
EditorsDavid P. Woodruff
PublisherSociety for Industrial and Applied Mathematics
Pages2861-2880
Number of pages20
ISBN (Electronic)978-1-61197-791-2
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventACM-SIAM Symposium on Discrete Algorithms - Alexandria, United States
Duration: 7 Jan 202410 Jan 2024
Conference number: 35

Conference

ConferenceACM-SIAM Symposium on Discrete Algorithms
Abbreviated titleSODA
Country/TerritoryUnited States
CityAlexandria
Period07/01/202410/01/2024

Fingerprint

Dive into the research topics of 'A (3 + ε)-Approximate Correlation Clustering Algorithm in Dynamic Streams'. Together they form a unique fingerprint.

Cite this