Integrating Fuzzy Matches into Sentence-level Quality Estimation for Neural Machine Translation

Arda Tezcan

Authors

Arda Tezcan Ghent University

Abstract

Previous studies show that neural machine translation (NMT) systems produce translations with higher quality when highly similar sentences (i.e. fuzzy matches; FMs) to a given input sentence can be found in the NMT training data. This study explores the usefulness of FMs for the task of sentence-level quality estimation (QE) for NMT. To this end, fuzzy matches are integrated into the QE architecture that utilizes a pre_trained XLM RoBERTa model, through a data augmentation methodology. The results show that FMs improve QE performance in domainspecific scenarios when using translation edit rate (TER) as quality labels. However, similar improvements are not observed when the same methodology is applied to a general-domain setting when quality labels were generated through direct (manual) assessment of translation quality or by measuring the technical post-editing effort required for transforming the MT output to its post-edited version.

Integrating Fuzzy Matches into Sentence-level Quality Estimation for Neural Machine Translation

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)