Detecting and correcting spelling errors in high-quality Dutch Wikipedia text

  • Merijn Beeksma Radboud University Nijmegen, the Netherlands
  • Maarten van Gompel Radboud University Nijmegen, the Netherlands
  • Florian Kunneman Tilburg University, the Netherlands
  • Louis Onrust Radboud University Nijmegen, the Netherlands
  • Bouke Regnerus University of Twente, the Netherlands
  • Dennis Vinke University of Twente, the Netherlands
  • Eduardo Brito Fraunhofer IAIS, Sankt Augustin, Germany
  • Christian Bauckhage Fraunhofer IAIS, Sankt Augustin, Germany
  • Rafet Sifa Fraunhofer IAIS, Sankt Augustin, Germany

Abstract

For the CLIN28 shared task, we evaluated systems for spelling correction of high-quality text. The task focused on detecting and correcting spelling errors in Dutch Wikipedia pages. Three teams took part in the task. We compared the performance of their systems to that of a baseline system, the Dutch spelling corrector Valkuil. We evaluated the systems’ performance in terms of F1 score. Although two of the three participating systems performed well in the task of correcting spelling errors, error detection proved to be a challenging task, and without exception resulted in a high false positive rate. Therefore, the F1 score of the baseline was not improved upon. This paper elaborates on each team’s approach to the task, and discusses the overall challenges of correcting high-quality text.

Published
2018-12-01
How to Cite
Beeksma, M., van Gompel, M., Kunneman, F., Onrust, L., Regnerus, B., Vinke, D., Brito, E., Bauckhage, C., & Sifa, R. (2018). Detecting and correcting spelling errors in high-quality Dutch Wikipedia text. Computational Linguistics in the Netherlands Journal, 8, 122-137. Retrieved from https://clinjournal.org/clinj/article/view/83
Section
Articles