Integrating Fuzzy Matches into Sentence-level Quality Estimation for Neural Machine Translation
Abstract
Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.