Profiling Dutch Authors on Twitter: Discovering Political Preference and Income Level
Abstract
Research in author profiling has primarily focused on English-speaking users and attributes like age, gender and occupation. We present first experiments on automatic profiling Dutch Twitter users for two less-studied attributes, namely their political preference and income level (low vs high). We create two novel corpora using distant supervision, evaluate the corpus creation approach, and train predictive models for each attribute. Our empirical evaluation shows that distant supervision is surprisingly reliable and political preference and income level of Dutch users can be predicted relatively accurately from the linguistic input. We also discuss which features are predictive for income and political preference, respectively.