Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance

Jens Van Nooten; Ilia Markov; Walter Daelemans

Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance

Authors

Jens Van Nooten Universiteit Antwerpen
Ilia Markov Universiteit Antwerpen
Walter Daelemans Universiteit Antwerpen

Abstract

In this paper, we examine the importance of word category information for the age detection task – the task of identifying the age of a person based on their writing – both under in-domain and cross-domain conditions. We remove entire word classes and study its effect using both Support Vector Machines (SVM) and pre-trained contextual word embeddings (BERT). By conducting these experiments, we aim to gain insight into how both approaches handle cross-domain conditions. Our experiments show that, on the one hand, SVM mainly relies on content words in the in-domain settings, while function words are the most indicative features in the cross-domain setup. BERT, on the other hand, mainly relies on highly-frequent word classes, such as nouns and punctuation, to make predictions both under in-domain and cross-domain age detection conditions.

Downloads

Published

2021-12-31

How to Cite

Van Nooten, J., Markov, I., & Daelemans, W. (2021). Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models’ Performance. Computational Linguistics in the Netherlands Journal, 11, 71–84. Retrieved from https://clinjournal.org/clinj/article/view/122

Download Citation

Issue

Vol. 11 (2021)

Section

Articles

Evaluating the Impact of Word Classes on Cross-Domain Age Detection Models' Performance

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)