Location-focused translation of flooding events in news articles
Abstract
We are interested in the automatic extraction of information on flooding events in the Philippines from local news papers. Given that the majority of existing information extraction tools have been developed for English, this study aims to investigate the feasibility of using open-source machine translation (MT) tools to translate Tagalog news items to English. Extra care should be taken when translating location names, as precise location information is indispensable for ef- fective disaster management. We fine-tuned an open-source multi-lingual MT model for disaster news in Tagalog. We investigated several methods to enhance the model performance on location translation and evaluated the different versions to compare the translation quality of locations using a custom location-focused evaluation metric. To this end, two new Tagalog-English datasets specific to the domain were introduced for the purposes of fine-tuning and evaluation. We tested out fine-tuning on domain specific data and two masking techniques using either general masks or database-look-up of names. Contrary to our expectations, our findings show that the base open-source multi-lingual MT model was already proficient in location translation. Our analysis indicates that fine-tuning on domain-specific data improves overall machine translation quality. Our manual analysis provides insight into specific errors of location translation and the unique effects of the fine-tuning techniques.