TY - JOUR AU - Avontuur, Tetske AU - Balemans, Iris AU - Elshof, Laura AU - van Noord, Nanne AU - van Zaanen, Menno PY - 2012/12/01 Y2 - 2024/03/28 TI - Developing a part-of-speech tagger for Dutch tweets JF - Computational Linguistics in the Netherlands Journal JA - CLIN Journal VL - 2 IS - 0 SE - Articles DO - UR - https://clinjournal.org/clinj/article/view/14 SP - 34-51 AB - <p>In this article we describe the design and creation of a part-of-speech tagger specifically for Dutch data from the popular microblogging service Twitter. Starting from the D-Coi part-of-speech tag set, which is also used in the SoNaR project, we added several Twitter-specific tags to allow the tagging of hashtags, @ mentions, emoticons and URLs. The tagger consists of the Frog tagger combined with a post-processing module that incorporates the new, Twitter-specific tags in the Frog part-of-speech output. Running the Frog tagger and the post-processing module sequentially leads to a part-of-speech tagger for Dutch tweets. Approximately 1 million tweets collected in the context of the SoNaR project were tagged by Frog and the post-processor combined. A sub-set of annotated tweets have been manually checked. Lastly, we evaluated the adapted part-of-speech tagger.</p><p>This project was accomplished by eight Master’s students from Tilburg University, who had just completed a course in natural language processing. In addition to the theoretical knowledge they acquired during the course, this project, which took approximately a week, offered them hands-on experience.</p> ER -