Resource-Light Bantu Part-of-Speech Tagging
Date
2012Author
Pauwy, Guy De
de Schryverz, Gilles-Maurice
Looy, Janneke van de
Type
PresentationLanguage
enMetadata
Show full item recordAbstract
Recent scientific publications on data-driven part-of-speech tagging of Sub-Saharan African languages have reported encouraging accuracy scores, using off-the-shelf tools and often fairly limited amounts of training data. Unfortunately, no research efforts exist that explore
which type of linguistic features contribute to accurate part-of-speech tagging for the languages under investigation. This paper describes
feature selection experiments with a memory-based tagger, as well as a resource-light alternative approach. Experimental results show
that contextual information is often not strictly necessary to achieve a good accuracy for tagging Bantu languages and that decent results
can be achieved using a very straightforward unigram approach, based on orthographic features
Citation
Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012)Publisher
CLiPS - Computational Linguistics Group University of Antwerp, Belgium School of Computing and Informatics, University of Nairobi