Automatic word sense disambiguation for Dutch using dependency information

  • Hessel Haagsma Rijksuniversiteit Groningen

Abstract

An automatic word sense disambiguation system utilizing dependency information is implemented using existing language resources for Dutch (Lassy, Alpino, Cornetto) and tested on a subset of DutchSemCor. The disambiguation method used largely follows the method first proposed by Lin (1997). It defines words by their local context, represented as dependency triples. The notion that words occurring in the same local contexts are semantically close to the ambiguous word is used to create a list of similar words. The correct sense is then found by selecting the sense that is semantically closest to the words in this list.

Performance on a set of nouns, verbs and adjectives is tested, and overall performance is comparable or slightly higher than that reported by Lin: almost 9% over baseline for fine-grained sense distinctions and over 3% over baseline for coarse-grained sense distinctions. In absolute terms, disambiguation accuracy was highest for nouns, slightly lower for verbs and lowest for adjectives. The effect of using different local contexts and semantic databases was tested, which indicated that a reliable sense-annotated corpus is still required and that quality and types of dependency relations in the local context database matters more than quantity. Overall, performance is as expected, showing that dependency contexts are a useful feature for word sense disambiguation for Dutch.

Published
2015-11-01
How to Cite
Haagsma, H. (2015). Automatic word sense disambiguation for Dutch using dependency information. Computational Linguistics in the Netherlands Journal, 5, 15-24. Retrieved from https://clinjournal.org/clinj/article/view/54
Section
Articles