Improving Domain-specific Cross-lingual Embeddings with Automatically Generated Bilingual Dictionaries

Authors

Abstract

This paper reports on a set of proof-of-concept experiments performed to evaluate and improve the alignment of monolingual embeddings for a specialised domain, viz. the medical use case of heart failure. The presented approach, which creates domain-specific dictionaries on-the-fly from cross-lingual Wikipedia links, achieves good results for cross-lingual alignment of this specialised vocabulary in three language pairs: English-Dutch, English-French, and Dutch-French. The experimental results show that the setup incorporating a smaller but dedicated domain-specific dictionary outperforms the alignment incorporating a larger but general-domain seed dictionary. A detailed error analysis reveals that many potentially useful (near-)equivalents are found beyond those present in the gold standard, and it inspires strategies for further improvements, such as lemmatisation and improved tokenisation.

Downloads

Published

2022-12-22

Issue

Section

Articles

How to Cite

Improving Domain-specific Cross-lingual Embeddings with Automatically Generated Bilingual Dictionaries. (2022). Computational Linguistics in the Netherlands Journal, 12, 125-140. https://clinjournal.org/clinj/article/view/151

Most read articles by the same author(s)