Generating Simplified Dutch Texts for Pupils Through N-Shot Learning
Abstract
Text simplification (TS) aims to improve text readability while retaining its original meaning, aiding individuals with limited literacy skills or reading comprehension challenges. While substantial progress has been made in TS for English, there is a notable lack of research for Dutch, in part caused by the absence of Dutch parallel simplification corpora. This study investigates the effectiveness of N-shot learning using generative open-source large language models (LLM) for TS in Dutch, circumventing the need for extensive parallel corpora. Various N-shot learning techniques are assessed for their performance in generating simplified Dutch texts for pupils. The readability and appropriateness of these texts is evaluated using automatic readability assessment models and human evaluations. Results indicate that while one-shot learning using a Dutch monolingual generative LLM shows the highest performance among the tested methods, the overall effectiveness is poor, with metrics close to random guess probabilities. Human evaluation further highlights significant issues and that the generated outputs often do not match the intended readability levels and appropriateness for specific educational contexts. These findings suggest that current N-shot learning methodologies are not effective for Dutch TS, emphasising the need for more refined approaches and better training data to improve performance in this task.