Extraction of Phrase-Structure Fragments with a Linear Average Time Tree-Kernel

Andreas van Cranenburgh

Extraction of Phrase-Structure Fragments with a Linear Average Time Tree-Kernel

Authors

Andreas van Cranenburgh Huygens Institute for the History of the Netherlands, Royal Netherlands Academy of Arts and Sciences

Abstract

We present an algorithm and implementation for extracting recurring fragments from treebanks. Using a tree-kernel method the largest common fragments are extracted from each pair of trees. The algorithm presented achieves a thirty-fold speedup over the previously available method on the Wall Street Journal dataset. It is also more general, in that it supports trees with discontinuous constituents. The resulting fragments can be used as a tree-substitution grammar or in classification problems such as authorship attribution and other stylometry tasks.

Downloads

Published

2014-12-01

How to Cite

van Cranenburgh, A. (2014). Extraction of Phrase-Structure Fragments with a Linear Average Time Tree-Kernel. Computational Linguistics in the Netherlands Journal, 4, 3–16. Retrieved from https://clinjournal.org/clinj/article/view/36

Download Citation

Issue

Vol. 4 (2014)

Section

Articles

Extraction of Phrase-Structure Fragments with a Linear Average Time Tree-Kernel

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)