The SAWA corpus: a parallel corpus English - Swahili

Pauw, Guy De; Wagacha, Peter Waiganjo; de Schryver, Gilles-Maurice

Date

2009

Author

Pauw, Guy De

Wagacha, Peter Waiganjo

de Schryver, Gilles-Maurice

Type

Presentation

Language

Metadata

Show full item record

Abstract

Research in data-driven methods for Machine Translation has greatly beneﬁted from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the source language to the corresponding units in the target language. An aligned parallel corpus therefore facilitates the automatic development of a machine translation system and can also bootstrap annotation through projection. In this paper, we describe data collection and annotation efforts and preliminary experimental results with a parallel corpus English - Swahili.

URI

http://dl.acm.org/citation.cfm?id=1564511
http://hdl.handle.net/11295/37612

Citation

Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages – AfLaT 2009, pages 9–16, Athens, Greece, 31 March 2009

Publisher

Association for Computational Linguistics

School of Computing and Informatics, University of Nairobi, Kenya

African Languages and Cultures, Ghent University, Belgium Xhosa Department, University of the Western Cape, South Africa

CNTS - Language Technology Group, University of Antwerp, Belgium

Collections

Faculty of Science & Technology (FST) [853]