Evaluating Dutch Speakers and Large Language Models on Standard Dutch: a grammatical Challenge Set based on the Algemene Nederlandse Spraakkunst
Abstract
This study evaluates the linguistic knowledge of Dutch Large Language Models (LLMs) by introducing a novel challenge set based on the Algemene Nederlandse Spraakkunst (ANS). The ANS is a comprehensive resource of Dutch prescriptive grammar created by linguists. We collect acceptability judgements of Dutch native speakers on our dataset, validating its usability while observing varying degrees of grammatical acceptability on specific syntactic phenomena. We evaluate both transformer-encoder and transformer-decoder Dutch LLMs on this dataset, and we compare their performance against the standard rules of Dutch in our dataset and the speaker ratings. We find that transformer-encoder models exhibit almost perfect accuracy on our dataset, yet sensitivities for specific sentences differ between models and humans, partially due to mismatches between the reference grammar and actual use of Dutch.