The Automated Detection of Racist Discourse in Dutch Social Media

Stéphan Tulkens; Lisa Hilte; Elise Lodewyckx; Ben Verhoeven; Walter Daelemans

Authors

Stéphan Tulkens CLiPS, University of Antwerp
Lisa Hilte CLiPS, University of Antwerp
Elise Lodewyckx CLiPS, University of Antwerp
Ben Verhoeven CLiPS, University of Antwerp
Walter Daelemans CLiPS, University of Antwerp

Abstract

We present two experiments on the automated detection of racist discourse in Dutch social media. In both experiments, multiple classifiers are trained on the same training set. This training set consists of Dutch posts retrieved from two public Belgian social media pages which are likely to attract racist reactions. The posts were labeled as racist or non-racist by multiple annotators, who reached an acceptable agreement score. The different classification models all use the Support Vector Machine algorithm, but use different (sets of) linguistic features, which can be lexical, stylistic or dictionary-based. In the first experiment, the models are evaluated on a test set containing unseen comments retrieved from the same pages as the training set (and thus also skewed towards racism). In the second experiment, the same models from Experiment 1 are tested on an alternative test set, containing more neutral comments, retrieved from the social media page of a Belgian newspaper. In both experiments, the best performing model relies on a dictionary containing different word categories specifically related to racist discourse. It reaches an F-score of 0.47 (exp. 1) and 0.40 (exp. 2) for the racist class and ROC Area Under Curve scores of 0.64 (exp. 1) and 0.73 (exp. 2). The dictionaries, code, and the procedure for requesting the corpus are available at: https://github.com/clips/hades

The Automated Detection of Racist Discourse in Dutch Social Media

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)