The Automated Detection of Racist Discourse in Dutch Social Media


  • Stéphan Tulkens CLiPS, University of Antwerp
  • Lisa Hilte CLiPS, University of Antwerp
  • Elise Lodewyckx CLiPS, University of Antwerp
  • Ben Verhoeven CLiPS, University of Antwerp
  • Walter Daelemans CLiPS, University of Antwerp


We present two experiments on the automated detection of racist discourse in Dutch social media. In both experiments, multiple classifiers are trained on the same training set. This training set consists of Dutch posts retrieved from two public Belgian social media pages which are likely to attract racist reactions. The posts were labeled as racist or non-racist by multiple annotators, who reached an acceptable agreement score. The different classification models all use the Support Vector Machine algorithm, but use different (sets of) linguistic features, which can be lexical, stylistic or dictionary-based. In the first experiment, the models are evaluated on a test set containing unseen comments retrieved from the same pages as the training set (and thus also skewed towards racism). In the second experiment, the same models from Experiment 1 are tested on an alternative test set, containing more neutral comments, retrieved from the social media page of a Belgian newspaper. In both experiments, the best performing model relies on a dictionary containing different word categories specifically related to racist discourse. It reaches an F-score of 0.47 (exp. 1) and 0.40 (exp. 2) for the racist class and ROC Area Under Curve scores of 0.64 (exp. 1) and 0.73 (exp. 2). The dictionaries, code, and the procedure for requesting the corpus are available at:




How to Cite

Tulkens, S., Hilte, L., Lodewyckx, E., Verhoeven, B., & Daelemans, W. (2016). The Automated Detection of Racist Discourse in Dutch Social Media. Computational Linguistics in the Netherlands Journal, 6, 3–20. Retrieved from




Most read articles by the same author(s)