T-Scan: a new tool for analyzing Dutch text

  • Henk Pander Maat Department of Languages, Literature and Communication, Utrecht University
  • Rogier Kraf Department of Languages, Literature and Communication, Utrecht University
  • Antal van den Bosch Centre for Language Studies, Radboud University Nijmegen
  • Nick Dekker Department of Languages, Literature and Communication, Utrecht University
  • Maarten van Gompel Centre for Language Studies, Radboud University Nijmegen
  • Suzanne Kleijn Department of Languages, Literature and Communication, Utrecht University
  • Ted Sanders Department of Languages, Literature and Communication, Utrecht University
  • Ko van der Sloot Department of Communication and Information Sciences, Tilburg School of Humanities

Abstract

T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it features from tools such as Frog and Alpino, and resources such as SoNaR, SUBTLEX-NL and Referentie Bestand Nederlands.

This paper offers a qualitative discussion of a number of T-Scan features, based on a minimal demonstration corpus of six texts, three of them scientific articles and three of them drawn from a women’s magazine. We discuss features concerning lexical complexity, sentence complexity, referential cohesion and lexical diversity, lexical semantics and personal style. For all these domains we examine the construct validity as well as the reliability of a number of important features. We conclude that T-Scan offers a number of promising lexical and syntactic features, while the interpretation of referential cohesion/ lexical diversity features and personal style features is less clear. Further developing the application and analyzing authentic text need to go hand in hand.

Published
2014-12-01
How to Cite
Pander Maat, H., Kraf, R., van den Bosch, A., Dekker, N., van Gompel, M., Kleijn, S., Sanders, T., & van der Sloot, K. (2014). T-Scan: a new tool for analyzing Dutch text. Computational Linguistics in the Netherlands Journal, 4, 53-74. Retrieved from https://clinjournal.org/clinj/article/view/40
Section
Articles