Detecting Dialect Features Using Normalised Pointwise Information
Abstract
Feature extraction refers to the identification of important features which differentiate one dialect group from another. It is an important step in understanding the dialectal variation, a step which has traditionally been done manually. However, manual extraction of important features is susceptible to the following problems, namely it is a time-consuming task; there is a risk of overlooking certain features and lastly, every analyst can come up with a different set of features. In this paper we compare two earlier automatic approaches to dialect feature extraction, namely Factor Analysis (Pickl 2016) and Proki´c et al.’s (2012) method based on Fisher’s Linear Discriminant. We also introduce a new method based on Normalised Pointwise Mutual Information (nPMI), which
outperforms other methods on the tested data set.