Improving Data Generalization with Variational Autoencoders for Network Traffic Anomaly Detection

Mehrnoosh Monshizadeh, Vikramajeet Khatri, Marah Gamdou, Raimo Kantola, Zheng Yan

Research output: Contribution to journalArticleScientificpeer-review

37 Downloads (Pure)

Abstract

Deep generative models have increasingly become popular in different domains such as image processing, though, they hardly appear in the cybersecurity arena. While the main application of these models is dimensionality reduction, marginally they have been utilized for overcoming challenges such as data generalization and overfitting issues inherited from feature selection methods. To solve the mentioned challenges, we propose a combined architecture comprising a Conditional Variational AutoEncoder (CVAE) and a Random Forest (RF) classifier to automatically learn similarity among input features, provide data distribution in order to extract discriminative features from original features, and finally classify various types of attacks. CVAE introduces the labels of traffic packets into a latent space in order to better learn the changes of input samples and distinguish the data characteristics of each class. It avoids the confusion between classes while learning the whole data distribution. Compared with feature selection mechanisms such as Support Vector Machine Online (SVMo) by considering various evaluation metrics, the proposed architecture demonstrates considerable improvement in terms of performance. To verify the versatility of the proposed architecture, two publicly available datasets have been used in experiments.

Original languageEnglish
Article number9399440
Pages (from-to)56893-56907
Number of pages15
JournalIEEE Access
Volume9
DOIs
Publication statusPublished - 2021
MoE publication typeA1 Journal article-refereed

Keywords

  • Anomaly Detection
  • Anomaly detection
  • Classification algorithms
  • Data Mining
  • Feature extraction
  • Feature Selection
  • Machine Learning
  • Measurement
  • Random forests
  • Security
  • Telecommunication traffic
  • Vegetation

Fingerprint Dive into the research topics of 'Improving Data Generalization with Variational Autoencoders for Network Traffic Anomaly Detection'. Together they form a unique fingerprint.

Cite this