Noun Phrase and Verb Phrase Ellipsis in Dutch: Identifying Subject-Verb Dependencies with BERTje
Abstract
Previous research has set out to quantify the syntactic capacity of BERTje (the Dutch equivalent of BERT) in the context of phenomena such as control verb nesting and verb raising in Dutch. Another complex language phenomenon is ellipsis, where a constituent is omitted from a sentence and can be recovered using context. Like verb raising and control verb nesting, ellipsis is suitable for evaluating BERTje’s linguistic capacity since it requires the processing of syntactic and lexical cues to recover the elided phrases. This work outlines an approach to identify subject-verb dependencies in Dutch sentences with verb phrase and noun phrase ellipsis using BERTje. Results will inform about BERTje’s capability of capturing syntactic information and its ability to capture ellipsis in particular. Understanding more about how computational models process ellipsis and how it can be improved is crucial for boosting the performance of language models, as natural language contains many instances of ellipsis. Using training data from Lassy, converted to contextualized embeddings using BERTje, a probe model is trained to identify subject-verb dependencies. The model is tested on sentences generated using a Context Free Grammar (CFG), which is designed to generate sentences containing ellipsis. These sentences are also converted to contextualized representations using BERTje. Results show that BERTje’s syntactic abilities are lacking, shown by accuracy drops compared to baseline measures.
