SuperSketch: A Multi-Dimensional Reversible Data Structure for Super Host Identification

Xuyang Jing, Hui Han, Zheng Yan, Witold Pedrycz

Research output: Contribution to journalArticleScientificpeer-review


Facing big network traffic data, effective data compression becomes crucially important and urgently needed for estimating host cardinalities and identifying super hosts. However, the current literature confronts several challenges: incapability of simultaneously measuring various types of host cardinalities and inability to efficiently reconstruct super host addresses. To address these challenges, in this paper, we propose a novel sketch data structure, named SuperSketch, to simultaneously measure multiple types of host cardinalities with the purpose of efficiently identifying super hosts. SuperSketch has two significant characteristics: multi-dimensionality and reversibility. The multi-dimensionality makes SuperSketch capable of simultaneously measuring Source Cardinality, Destination Cardinality and Destination Port Cardinality. The reversibility allows SuperSketch to accurately and quickly reconstruct the original addresses of super hosts once they are identified. We conduct both theoretical analysis and performance evaluation based on real-world network traffic. Experimental results show that SuperSketch achieves outstanding performance for multi-cardinality measurement, super host identification and host address reconstruction.

Original languageEnglish
JournalIEEE Transactions on Dependable and Secure Computing
Publication statusE-pub ahead of print - 2021
MoE publication typeA1 Journal article-refereed


  • Host Cardinality
  • Network Traffic Measurement
  • Reversible Sketch
  • Super Host Identification


Dive into the research topics of 'SuperSketch: A Multi-Dimensional Reversible Data Structure for Super Host Identification'. Together they form a unique fingerprint.

Cite this