Automatic detection and correction of context-dependent dt-mistakes using neural networks

  • Geert Heyman Department of Computer Science KU Leuven
  • Ivan Vuli´c Language Technology Lab, DTAL, University of Cambridge, UK
  • Yannick Laevaert Department of Computer Science KU Leuven
  • Marie-Francine Moens Department of Computer Science KU Leuven

Abstract

We introduce a novel approach to correcting context-dependent dt-mistakes, one of the most frequent spelling errors in the Dutch language. We show that by using a neural network to estimate the probability distribution of a verb’s suffix conditioned jointly on its stem and context, we obtain large improvements over state-of-the-art spell checkers on three different benchmarking datasets, achieving a perfect score on a verb spelling test from de Standaard, a Flemish newspaper. The method is unsupervised and only relies on basic preprocessing tools to tokenize the text and identify verbs, which enables training on millions of sentences. Furthermore, we propose a method to determine which words in a sentence cause the system to make corrections, which is valuable for providing feedback to the user.

Published
2018-12-01
How to Cite
Heyman, G., Vuli´cI., Laevaert, Y., & Moens, M.-F. (2018). Automatic detection and correction of context-dependent dt-mistakes using neural networks. Computational Linguistics in the Netherlands Journal, 8, 49-65. Retrieved from https://clinjournal.org/clinj/article/view/79
Section
Articles