NMT’s wonderland where people turn into rabbits. A study on the comprehensibility of newly invented words in NMT output

Lieve Macken; Laura Van Brussel; Joke Daems

Authors

Lieve Macken
Laura Van Brussel
Joke Daems

Abstract

Machine translation (MT) quality has improved enormously since the arrival of neural machine translation (NMT). The most noticeable improvement compared to statistical MT systems is the increased grammaticality and fluency of the produced MT output. At the lexical level, the quality of NMT systems is less promising. New types of lexical mistakes appear in NMT output, such as the occurrence of non existing words, i.e. words that are not part of the vocabulary of the target language and were thus invented by the NMT system. For MT use cases in which readers only have access to the MT output without the source text, such non-existing words can affect comprehension as the intended source meaning may not be recovered. To investigate if and to what extent non-existing words in English-to-Dutch NMT output impair comprehension, an experiment was set up in SurveyMonkey. Eighty-six participants were given 15 non-existing words (5 single words and 10 noun compounds) and were either asked to describe the meaning of these words or to select the correct meaning from a predefined list. The words were presented either in isolation or in sentence context. Participants were asked to indicate how confident they were about their answer. Results show that non existing words indeed impair comprehension as in 60% of the cases the participants gave a wrong answer. Sentence context had a positive impact and made it easier for the participants to determine the meaning of the non-existing word. Participants were also more confident about their answer when the words were presented in sentence context.

Author Biographies

Lieve Macken

University of Ghent

Laura Van Brussel

University of Ghent

Joke Daems

University of Ghent

NMT’s wonderland where people turn into rabbits. A study on the comprehensibility of newly invented words in NMT output

Authors

Abstract

Author Biographies

Lieve Macken

Laura Van Brussel

Joke Daems

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)