Abstract
Stylometry can be used to profile or deanonymize authors against their will based on writing style. Style transfer provides a defence. Current techniques typically use either encoder-decoder architectures or rule-based algorithms. Crucially, style transfer must reliably retain original semantic content to be actually deployable. We conduct a multifaceted evaluation of three state-of-the-art encoder-decoder style transfer techniques, and show that all fail at semantic retainment. In particular, they do not produce appropriate paraphrases, but only retain original content in the trivial case of exactly reproducing the text. To mitigate this problem we propose ParChoice: a technique based on the combinatorial application of multiple paraphrasing algorithms. ParChoice strongly outperforms the encoder-decoder baselines in semantic retainment. Additionally, compared to baselines that achieve non-negligible semantic retainment, ParChoice has superior style transfer performance. We also apply ParChoice to multi-author style imitation (not considered by prior work), where we achieve up to 75% imitation success among five authors. Furthermore, when compared to two state-of-the-art rule-based style transfer techniques, ParChoice has markedly better semantic retainment. Combining ParChoice with the best performing rule-
based baseline (Mutant-X [34]) also reaches the highest style transfer success on the Brennan-Greenstadt and Extended-Brennan-Greenstadt corpora, with much less impact on original meaning than when using the rule-based baseline techniques alone. Finally, we highlight a critical problem that afflicts all current style transfer techniques: the adversary can use the same technique for thwarting style transfer via adversarial training. We show that adding randomness to style transfer helps to mitigate the effectiveness of adversarial training.
based baseline (Mutant-X [34]) also reaches the highest style transfer success on the Brennan-Greenstadt and Extended-Brennan-Greenstadt corpora, with much less impact on original meaning than when using the rule-based baseline techniques alone. Finally, we highlight a critical problem that afflicts all current style transfer techniques: the adversary can use the same technique for thwarting style transfer via adversarial training. We show that adding randomness to style transfer helps to mitigate the effectiveness of adversarial training.
Original language | English |
---|---|
Title of host publication | Proceedings on Privacy Enhancing Technologies |
Publisher | De Gruyter |
Pages | 175-195 |
Number of pages | 20 |
DOIs | |
Publication status | Published - 17 Aug 2020 |
MoE publication type | A4 Conference publication |
Event | Privacy Enhancing Technologies Symposium - Montreal, Canada Duration: 15 Jul 2020 → 19 Jul 2020 Conference number: 20 |
Publication series
Name | Proceedings on Privacy Enhancing Technologies |
---|---|
Publisher | De Gruyter |
Number | 4 |
Volume | 2020 |
ISSN (Electronic) | 2299-0984 |
Conference
Conference | Privacy Enhancing Technologies Symposium |
---|---|
Abbreviated title | PETS |
Country/Territory | Canada |
City | Montreal |
Period | 15/07/2020 → 19/07/2020 |
Keywords
- style transfer
- style imitation
- stylometry
- adversarial stylometry
- author profiling
- profiling
- deanonymization
- model evasion