Improving Dutch sentiment analysis in Pattern

  • Lorenzo Gatti Universiteit Twente
  • Judith van Stegeren Universiteit Twente

Abstract

In this paper we investigate methods for improving the sentiment analysis functionality of Pattern.nl, the Dutch submodule of Pattern, an open-source library for web mining and natural language processing. We discuss the impact on performance of three different potential improvements: extending the module’s internal sentiment lexicon; removing subsets of neutral words from the sentiment lexicon; and improving the algorithm for combining multiple word-level sentiment ratings into a sentence-level sentiment rating. We evaluated the improvements on datasets from the product review domain (books, clothing and music) and a dataset of short emotional stories. The experiments show that lexicon expansion does not lead to better results; new normalization techniques, on the other hand, show a limited but consistent performance increase for sentiment ratings.

Published
2020-12-12
How to Cite
Gatti, L., & van Stegeren, J. (2020). Improving Dutch sentiment analysis in Pattern. Computational Linguistics in the Netherlands Journal, 10, 73-89. Retrieved from https://clinjournal.org/clinj/article/view/105
Section
Articles