Tweet geography. Tweet Based Mapping of Dialect Features in Dutch Limburg
Abstract
We investigated whether tweets can be used to map dialect features (such as pronunciation or lexis) in the Dutch province of Limburg and, if so, how the resulting maps can be interpreted. We developed a mapping procedure based on the relative frequency of dialect variants of individual Twitter users and the relative frequencies of their geographically neighbouring Twitter users. We evaluated this procedure by comparing the geographical locations of written dialect variants retrieved from Twitter with the isoglosses and dialect regions known from dialectology. The results show that Twitter can indeed be a good source for dialect studies, when applied with some caution, to track new patterns of dialect variation caused by dialect shift and loss, internal migration within Limburg and the immigration of non-dialect speakers. Next, we compared, for the same Twitter data, this knowledge-rich approach (known dialect variants) to a knowledge-poor approach (letter trigrams). Here we found that trigram counts show strong correlational overlap with dialect variant counts, but the exact relation between the two needs further study