Developing an annotated corpus for Gıkuyu using language-independent machine learning techniques

Wagacha, Peter W; De Pauwy, Guy; Getao, Katherine W

View/Open

african_languages.pdf (2.513Mb)

Date

2006

Author

Wagacha, Peter W

De Pauwy, Guy

Getao, Katherine W

Type

Presentation

Language

Metadata

Show full item record

Abstract

Networking the development of computational resources for African languages can be greatly advanced if researchers aim to develop tools that are to a large extent language-independent and therefore reusable for other languages. In this paper we describe a particular case study, namely the development of an annotated corpus of G k uy u, using language-independent machine learning techniques. The general aim of our work on G k uy u is two-fold: on the one hand we wish to digitally preserve this resource-scarce language, while on the other hand it serves as a feasibility study of using language-independent machine learning techniques for linguistic annotation of corpora. To this end we investigate established annotation induction techniques like unsupervised learning and knowledge transfer. These methods can provide interesting perspectives for the linguistic description of many other resource-scarce languages.

URI

http://hdl.handle.net/11295/44325

Citation

Peter W.Wagacha , Guy De Pauwy and Katherine W. Getao (2006). Developing an annotated corpus for Gıkuyu using language-independent machine learning techniques

Publisher

School of Computing & Informatics

CNTS - Language Technology Group University of Antwerp, Antwerpen, Belgium

Collections

Faculty of Science & Technology (FST) [853]