Fuzzy Semantic Retrieval Strategies for Automated Short-Answer Grading with Large Language Models in Language Learning

Authors

Abstract

Automated assessment of short-answer exercises in language learning faces a fundamental challenge: multiple admissibility means that numerous distinct responses can be equally correct, rendering rule-based evaluation inadequate. This paper investigates how large language models (LLMs) can address this challenge through retrieval-augmented generation (RAG) for binary classification, determining whether a student's response to a given exercise is correct or incorrect. Across 306 experiments spanning nine grammar topics in English, Spanish, and Dutch (1,185 authentic student responses), ten retrieval approaches are evaluated, extending prior work on baselines and exercise-level matching with novel strategies: sentence-level matching, random selection, and semantic similarity methods adapted from fuzzy matching in translation memory systems. Two central findings emerge. First, RAG with semantic similarity proves effective for identifying relevant examples: when optimised per topic, it achieves 89.4% classification accuracy and recall up to 4.3 percentage points higher than rule-based exercise-level matching. Second, an accuracy-recall trade-off governs configuration choice: single-example configurations maximise accuracy (87.8%), while higher shot counts maximise recall (93.0%). These results establish new performance benchmarks for LLM-based short-answer grading in second language acquisition, with actionable guidance: student-facing applications should use low shot counts to optimise accuracy, while teacher-facing systems benefit from higher shot counts to ensure comprehensive error detection.

Downloads

Published

2026-06-01

Issue

Section

Articles

How to Cite

Fuzzy Semantic Retrieval Strategies for Automated Short-Answer Grading with Large Language Models in Language Learning. (2026). Computational Linguistics in the Netherlands Journal, 15, 79-103. https://clinjournal.org/clinj/article/view/246

Most read articles by the same author(s)