The CLIN33 Shared Task on the Detection of Text Generated by Large Language Models


  • Pieter Fivez University of Antwerp
  • Walter Daelemans University of Antwerp
  • Tim Van de Cruys KU Leuven
  • Yury Kashnitsky Elsevier
  • Savvas Chamezopoulos Elsevier
  • Hadi Mohammadi University of Utrecht
  • Anastasia Giachanou University of Utrecht
  • Ayoub Bagheri University of Utrecht
  • Wessel Poelman KU Leuven
  • Juraj Vladika Technical University of Munich
  • Esther Ploeger Aalborg University
  • Johannes Bjerva Aalborg University
  • Florian Matthes Technical University of Munich
  • Hans van Halteren Radboud University


The Shared Task for CLIN33 focuses on a relatively novel yet societally relevant task: the detection of text generated by Large Language Models (LLMs). We frame this detection task as a binary classification problem (LLM-generated or not), using test data from up to 6 different domains and text genres for both Dutch and English. Part of this test data was held out entirely from the contestants, including a ”mystery genre” which belonged to an unknown domain (later revealed to be columns). Four teams submitted 11 runs with substantially different models and features. This paper gives an overview of our task setup and contains the evaluation and detailed descriptions of the participating systems. Notably, included in the winning systems are both deep learning models as well as more traditional machine learning models leveraging task-specific feature engineering.




How to Cite

Fivez, P., Daelemans, W., Van de Cruys, T., Kashnitsky, Y., Chamezopoulos, S., Mohammadi, H., Giachanou, A., Bagheri, A., Poelman, W., Vladika, J., Ploeger, E., Bjerva, J., Matthes, F., & van Halteren, H. (2024). The CLIN33 Shared Task on the Detection of Text Generated by Large Language Models. Computational Linguistics in the Netherlands Journal, 13, 233–259. Retrieved from




Most read articles by the same author(s)

1 2 3 > >>