Intrinsic evaluation of Mono- and Multilingual Dutch Language Models

Daniel Vlantis; Jelke Bloem

Authors

Daniel Vlantis University of Amsterdam
Jelke Bloem University of Amsterdam

Abstract

Through transfer learning, multilingual language models can produce good results on extrinsic, downstream NLP tasks in low-resource languages despite a lack of abundant training data. In most cases, however, monolingual models still perform better. Using the Dutch SimLex-999 dataset, we intrinsically evaluate several pre-trained monolingual stacked encoder LLMs for Dutch and compare them to several multilingual models that support Dutch, including two with parallel architectures (BERTje and mBERT). We also try to improve these models’ semantic representations by tuning the multilingual models on additional Dutch data. Furthermore, we explore the effect of tuning these models on written versus transcribed spoken data. While we can improve multilingual model performance through fine-tuning, we find that significant amounts of fine-tuning data and compute are required to outscore monolingual models on the intrinsic evaluation metric.

Intrinsic evaluation of Mono- and Multilingual Dutch Language Models

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)