Native-data Models for Detecting and Correcting Errors in Learners’ Dutch

  • Lennart Kloppenburg CLCG, University of Groningen, The Netherlands
  • Malvina Nissim CLCG, University of Groningen, The Netherlands

Abstract

We address the task of automatically correcting errors in text written by learners of Dutch by modelling language usage of native speakers. Specifically, we concentrate on two word classes, namely prepositions and determiners, with a focus on articles for the latter. For each of these two word classes, we build two models exploiting a large corpus of Dutch. The first is a binary model for detecting whether a preposition/article should be used at all in a given position or not. The second is a multiclass model for selecting the appropriate preposition/article in case one should be used. The models are tested on native as well as learners data. For the latter we exploit a crowdsourcing strategy to elicit native judgements. On native test data the models perform very well, showing that we can model preposition usage appropriately. However, the evaluation on learners’ data shows that the models might be excessively tuned towards native data and there is still room for improving their adaptation to the intrinsic characteristics of learners’ data. Reflecting on such results, we envisage various ways of improving performance, and report them in the final section of this article

Published
2016-12-01
How to Cite
Kloppenburg, L., & Nissim, M. (2016). Native-data Models for Detecting and Correcting Errors in Learners’ Dutch. Computational Linguistics in the Netherlands Journal, 6, 39-55. Retrieved from https://clinjournal.org/clinj/article/view/63
Section
Articles