Natural Language Processing in Adversarial Settings and Beyond: Benefits and Risks of Text Classification, Transformation, and Representation

Research output: ThesisDoctoral ThesisCollection of Articles


Natural language processing (NLP) has developed significantly during recent years, with important consequences that extend beyond its immediate domain. The increased availability of NLP technologies has repercussions for information security and privacy in particular, both positive and negative. For example, classifying text based on semantic content or writing style has many benign uses, but also allows adversarial application for censorship or violations of privacy. Conversely, automatic text transformation can be used to perform model evasion attacks as well as defend against illegitimate profiling of text. This dissertation investigates the performance and security implications of NLP techniques across multiple tasks, with a focus on adversarial settings. We first explored how well state-of-the-art text classification techniques can detect various types of adversarial text, such as deception or hate speech. Here, we observed that classifiers tend to get caught on simple features regardless of model architecture, which can make them unreliable and vulnerable to evasion. Instead of complicating the model alone, increasing the training dataset is needed for improving performance. We further demonstrated that text transformation can successfully be used to expand training data artificially. However, some adversarial text classes – such as deception – are likely too context-dependent to be reliably detected by available techniques. We also applied text transformation to counteract classification, from both an attacker's and a defender's perspective. A major finding was that deep neural networks (DNNs) were unreliable at maintaining semantic content across transformations, in contrast to rule-based techniques that allow restrictive control of the output. On the other hand, DNNs are more flexible and can generate more variable texts than symbolic rules alone. This illustrates the complementary relationship between DNN-based and rule-based NLP, which speaks against discarding either. For mitigating model evasion, we show adversarial training to be beneficial against both kinds of techniques. Across both text classification and transformation tasks, the importance of input data representation becomes apparent. This has broad relevance in a variety of NLP settings. Motivated by recent developments in linguistic theory, we show that effective semantic representations can be attained with far fewer semantic roles than in prior formalisms. Based on this, we present a novel format that permits easy but highly detailed information retrieval, as well as straight-forward integration with DNNs as vectorized input. In addition to demonstrating the format's ability to retain information despite its structural simplicity, we applied it to parallel corpus extraction and text transformation tasks that resulted in multiple novel datasets we provide as open-access.
Translated title of the contributionNatural Language Processing in Adversarial Settings and Beyond: Benefits and Risks of Text Classification, Transformation, and Representation
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Asokan, N., Supervising Professor
  • Asokan, N., Thesis Advisor
Print ISBNs978-952-64-0443-1
Electronic ISBNs978-952-64-0444-8
Publication statusPublished - 2021
MoE publication typeG5 Doctoral dissertation (article)


  • text classification
  • text transformation
  • text representation
  • model evasion
  • deception
  • hate speech
  • stylometry
  • style transfer
  • data augmentation
  • semantics


Dive into the research topics of 'Natural Language Processing in Adversarial Settings and Beyond: Benefits and Risks of Text Classification, Transformation, and Representation'. Together they form a unique fingerprint.

Cite this