Improving Dutch sentiment analysis in Pattern
Abstract
In this paper we investigate methods for improving the sentiment analysis functionality of Pattern.nl, the Dutch submodule of Pattern, an open-source library for web mining and natural language processing. We discuss the impact on performance of three different potential improvements: extending the module’s internal sentiment lexicon; removing subsets of neutral words from the sentiment lexicon; and improving the algorithm for combining multiple word-level sentiment ratings into a sentence-level sentiment rating. We evaluated the improvements on datasets from the product review domain (books, clothing and music) and a dataset of short emotional stories. The experiments show that lexicon expansion does not lead to better results; new normalization techniques, on the other hand, show a limited but consistent performance increase for sentiment ratings.