BERT-based Transformer Fine-tuning for Dutch Wikidata Question-Answering
Abstract
People rely on data to understand the world and inform their decision-making. However, effective access to data has become more challenging over time: data has increased in volume and velocity, as has its variability in truthfulness, utility, and format. Therefore, improving our interfaces to data has become a pressing issue. One type of interface has lately gained renewed attention, driven by advances in artificial intelligence: natural language interfaces. As of yet, though, improvements in natural language processing (NLP) have largely concentrated on English. Thus, we propose a text-based Dutch question-answering (QA) interface for accessing information on Wikidata (URL: https://www.wikidata.org/), driven by a Dutch-to-SPARQL BERT-based transformer model. Said transformer is a type of encoder-decoder model characterised by use of self-attention. In our application, it is trained to accept sentences in Dutch and to transform these into corresponding SPARQL queries. By subsequently evaluating the obtained queries at a knowledge base, users can retrieve answers to their questions. Since our model learns end-to-end, we need to train it using a dataset consisting of pairs of Dutch questions and SPARQL queries. To this end, we closely follow the procedure of Cui et al. (2021). Particularly, we create a Dutch machine-translated version of LC-QuAD 2.0 (Dubey et al. 2019) and apply entity and relation masking on the nl inputs and sparql outputs for increased generality, producing a dataset with 2,648 examples. We then let the transformer model fine-tune on the training subset of this dataset, using system-level BLEU score as the performance measure. Our final transformer configuration obtains a test BLEU score of 51.86, which seems to be in line with results found by Cui et al. (2021). Additionally, we conduct a qualitative analysis of our model’s outputs, focusing especially on situations where the predicted sparql queries are incorrect. Here, we observe that queries involving infrequently-used sparql keywords and queries containing literals prove challenging to the transformer, as sometimes do the syntax of SPARQL and the general length of queries. Finally, we conclude our paper by proposing some potential future directions for our Dutch QA system.