N-gram Frequencies for Dutch Twitter Data

Gosse Bouma

N-gram Frequencies for Dutch Twitter Data

Authors

Gosse Bouma University of Groningen

Abstract

This paper presents n-gram frequency data obtained from a large sample of Dutch tweets, covering a period of 4 years. After filtering of re-tweets, (near-) duplicates, and non-Dutch tweets, more than 2.6 billion tweets remained. These were tokenized, and frequencies were collected for n-grams of up to 5 words. A web interface allows users to obtain frequency information for spelling variants, grammatical phenomena (as reflected in n-gram patterns), monthly trends, and word clusters. All the underlying n-gram frequency data as well as the word clusters are available for download

Downloads

Published

2015-11-01

Issue

Vol. 5 (2015)

Section

Articles

How to Cite

N-gram Frequencies for Dutch Twitter Data. (2015). Computational Linguistics in the Netherlands Journal, 5, 25-36. https://clinjournal.org/clinj/article/view/55

ACM
ACS
APA
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)
BibTeX

N-gram Frequencies for Dutch Twitter Data

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)