Investigating ‘Aspect’ in NMT and SMT: Translating the English Simple Past and Present Perfect
Abstract
One of the important differences between English and French grammar is related to how their verbal systems handle aspectual information. While the English simple past tense is aspectually neutral, the French and Spanish past tenses are linked with a particular imperfective/perfective aspect. This study examines what Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) learn about ‘aspect’ and how this is reflected in the translations they produce. We use their main knowledge sources, phrase-tables (SMT) and encoding vectors (NMT), to examine what kind of aspectual information they encode. Furthermore, we examine whether this encoded ‘knowledge’ is actually transferred during decoding and thus reflected in the actual translations. Our study is based on the translations of the English simple past and present perfect tenses into French and Spanish imperfective and perfective past tenses. We examine the interaction between the lexical aspect of English simple past verbs and the grammatical aspect expressed by the tense in the French/Spanish translations. It results that SMT phrase-tables contain information about the basic lexical aspect of verbs. Although lexical aspect is often closely related to the grammatical aspect expressed by the French and Spanish tenses, for some verbs (mainly atelic dynamic verbs) more contextual information is required in order to select an appropriate tense. The SMT n-grams provide insufficient context to grasp other aspectual factors included in the sentence to consistently select the tense with the appropriate aspectual value. On the other hand, the encoding vectors produced by our NMT system do contain information about the entire sentence. An analysis based on the English NMT encoding vectors shows that a logistic regression model can obtain an accuracy of 90% when trying to predict the correct tense based on the encoding vectors. However, these positive results are not entirely reflected in the actual translations, i.e. part of the aspectual information is lost during decoding.