Towards Identifying Normal Forms for Various Word Form Spellings on Twitter

Hans van Halteren; Nelleke Oostdijk

Authors

Hans van Halteren CLST, Radboud University Nijmegen
Nelleke Oostdijk CLST, Radboud University Nijmegen

Abstract

We take a first step towards the annotation of word forms in tweets with normal forms. Such annotation can assist research into spelling variation and the use of standard NLP tools to process tweets. This first step consists of the design of a technique to estimate whether two word forms can be considered variants of one and the same normal form. At this point we are examining word form types in isolation, i.e. without taking the context into account. We describe a word form similarity measurement which combines edit distance and context similarity over our whole tweet collection. Furthermore, we present the results of a pilot study, which we executed on 7Gw worth of Dutch tweets. We find that, while results are encouraging, various improvements to the similarity estimations are still possible.

Towards Identifying Normal Forms for Various Word Form Spellings on Twitter

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)