Self-distillation for German and Dutch dependency parsing

Daniël de Kok; Tobias Pütz

Authors

Daniël de Kok University of Tübingen
Tobias Pütz University of Tübingen

Abstract

In this paper, we explore self-distillation as a means to improve statistical dependency parsing models for Dutch and German over purely supervised training. Self-distillation (Furlanello et al. 2018) trains a new student model on the output of an existing (weaker) teacher model. In contrast to most previous work on self-distillation, we perform distillation using a large, unannotated corpus. We show that in dependency parsing as sequence labeling (Spoustov´a and Spousta 2010, Strzyz et al. 2019), self-distillation plus finetuning provides large improvements over models that use supervised training. We carry out experiments on the German T¨uBa-D/Z universal dependency (UD) treebank (C¸ ¨oltekin et al. 2017) and the UD conversion of the Dutch Lassy Small treebank (Bouma and van Noord 2017). We find that self-distillation improves German parsing accuracy of a bidirectional LSTM parser from 92.23 to 94.33 Labeled Attachment Score (LAS). Similarly, on Dutch we see improvement from 89.89 to 91.84 LAS.

Self-distillation for German and Dutch dependency parsing

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)