Lexical semantic change detection for Ancient Greek: dataset creation and evaluation of a word-embedding-based technique
Abstract
We create a benchmark for the evaluation of lexical semantic change detection in Ancient Greek and use it to assess the validity of two metrics of lexical semantic change on diachronic embeddings models. Stopponi et al. (2024b) assessed the viability of lexical semantic change detection for Ancient Greek with word2vec models, using two existing measures. However, only a manual evaluation was conducted since a benchmark for the evaluation of this task for Ancient Greek was still missing. We create such a benchmark by extracting cases of semantic change from close-reading studies in Ancient Greek lexical semantics. We also create a parallel benchmark of semantically stable items and assess the effectiveness of the most relevant of the two metrics in distinguishing semantically changed from semantically stable items. Finally, we qualitatively evaluate the candidates for semantic change detected by filtering words by low vector coherence value and high frequency. The results show that the method is effective at retrieving cases of semantic change, especially when coupled with frequency information, but also reinforce the idea that performing lexical semantic change detection on an ancient language and building a robust evaluation benchmark are particularly challenging tasks. In conclusion, we propose a constructive way to leverage this method as a research companion, by integrating it with the close-reading method.