All You Need Is "Love": Evading Hate Speech Detection

Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, N. Asokan

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference contributionScientificvertaisarvioitu

38 Sitaatiot (Scopus)
232 Lataukset (Pure)

Abstrakti

With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective - a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.
AlkuperäiskieliEnglanti
OtsikkoProceedings of the 11th ACM Workshop on Artificial Intelligence and Security
JulkaisupaikkaNew York
KustantajaACM
Sivut2-12
Sivumäärä10
ISBN (elektroninen)978-1-4503-6004-3
DOI - pysyväislinkit
TilaJulkaistu - 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaACM Workshop on Artificial Intelligence and Security - Toronto, Kanada
Kesto: 19 lokakuuta 201819 lokakuuta 2018
Konferenssinumero: 11

Workshop

WorkshopACM Workshop on Artificial Intelligence and Security
LyhennettäAISec
MaaKanada
KaupunkiToronto
Ajanjakso19/10/201819/10/2018

Sormenjälki

Sukella tutkimusaiheisiin 'All You Need Is "Love": Evading Hate Speech Detection'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä