Transformation-based tree-to-tree alignment
Abstract
Previous experiments suggest that a rule-based approach to tree alignment error correction serves to be an effective complement to statistical alignment. We show how, using relatively few features, an implementation of Brill’s Transformation-Based Learning algorithm improves the results of a high precision model of the statistical aligner Lingua-Align. Using our system to correct already tree aligned data, we achieve balanced F-scores of 80.6 on our test set and 85.2 on our development test set. Using it as a tree aligner on word aligned data, our best F-scores using the same model amount to 78.7 and 83.0 respectively. Finally, we apply a pipeline of alignment and error correction tools to create several versions of a large parallel treebank consisting of various domains for Dutch to English for use in a syntax-based MT system. We conclude that transformation-based learning is a promising approach for the large-scale creation of parallel treebanks for various NLP purposes