Improving Domain-specific Cross-lingual Embeddings with Automatically Generated Bilingual Dictionaries

Pranaydeep Singh; Ayla Rigouts Terryn; Els Lefever

Authors

Pranaydeep Singh Ghent University
Ayla Rigouts Terryn KU Leuven
Els Lefever Ghent University

Abstract

This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.

Improving Domain-specific Cross-lingual Embeddings with Automatically Generated Bilingual Dictionaries

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)