Transformation-based tree-to-tree alignment

  • Gideon Kotzé University of Groningen, Groningen, the Netherlands

Abstract

Previous experiments suggest that a rule-based approach to tree alignment error correction serves to be an effective complement to statistical alignment. We show how, using relatively few features, an implementation of Brill’s Transformation-Based Learning algorithm improves the results of a high precision model of the statistical aligner Lingua-Align. Using our system to correct already tree aligned data, we achieve balanced F-scores of 80.6 on our test set and 85.2 on our development test set. Using it as a tree aligner on word aligned data, our best F-scores using the same model amount to 78.7 and 83.0 respectively. Finally, we apply a pipeline of alignment and error correction tools to create several versions of a large parallel treebank consisting of various domains for Dutch to English for use in a syntax-based MT system. We conclude that transformation-based learning is a promising approach for the large-scale creation of parallel treebanks for various NLP purposes

Published
2012-12-01
How to Cite
Kotzé, G. (2012). Transformation-based tree-to-tree alignment. Computational Linguistics in the Netherlands Journal, 2, 71-96. Retrieved from https://clinjournal.org/clinj/article/view/17
Section
Articles