Resource-Light Bantu Part-of-Speech Tagging

Pauwy, Guy De; de Schryverz, Gilles-Maurice; Looy, Janneke van de

View/Open

Resource-Light Bantu Part-of-Speech Tagging.pdf (839.0Kb)

Date

2012

Author

Pauwy, Guy De

de Schryverz, Gilles-Maurice

Looy, Janneke van de

Type

Presentation

Language

Metadata

Show full item record

Abstract

Recent scientiﬁc publications on data-driven part-of-speech tagging of Sub-Saharan African languages have reported encouraging accuracy scores, using off-the-shelf tools and often fairly limited amounts of training data. Unfortunately, no research efforts exist that explore which type of linguistic features contribute to accurate part-of-speech tagging for the languages under investigation. This paper describes feature selection experiments with a memory-based tagger, as well as a resource-light alternative approach. Experimental results show that contextual information is often not strictly necessary to achieve a good accuracy for tagging Bantu languages and that decent results can be achieved using a very straightforward unigram approach, based on orthographic features

URI

http://tshwanedje.com/publications/BantuPOS.pdf
http://hdl.handle.net/11295/37634

Citation

Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)

Publisher

CLiPS - Computational Linguistics Group University of Antwerp, Belgium

School of Computing and Informatics, University of Nairobi

Collections

Faculty of Science & Technology (FST) [853]