Linguistic proxies of readability: Comparing easy-to-read and regular newspaper Dutch

Vincent Vandeghinste; Bram Bulté

Authors

Vincent Vandeghinste
Bram Bulté

Abstract

The aim of this study is to identify linguistic proxies of readability in Dutch, i.e. those linguistic features that define text as being easy-to-read. To this end, we compare the Wablieft corpus (Vandeghinste et al. 2019) (Flemish easy-to-read newspaper archives) to articles that appeared in the regular Flemish newspaper De Standaard, using a wide range of lexical, syntactic and readability metrics. We test which of these metrics has the highest effect size and which combinations of metrics work best in a classification task predicting whether articles belong to Wablieft or De Standaard. The results indicate that the best linguistic proxy for readability is (not surprisingly) the average number of words per sentence. Traditional reading metrics score well, although the combination of the parameters constituting these metrics score better in logistic regression than the original metrics.

Author Biographies

Vincent Vandeghinste

Instituut voor de Nederlandse Taal (Leiden, Netherlands)
Bram Bulté

KU Leuven (Belgium)

Linguistic proxies of readability: Comparing easy-to-read and regular newspaper Dutch

Authors

Abstract

Author Biographies

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)