dc.contributor.author | Pauwy, Guy De | |
dc.contributor.author | de Schryverz, Gilles-Maurice | |
dc.contributor.author | Looy, Janneke van de | |
dc.date.accessioned | 2013-06-21T13:47:48Z | |
dc.date.available | 2013-06-21T13:47:48Z | |
dc.date.issued | 2012 | |
dc.identifier.citation | Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012) | en |
dc.identifier.uri | http://tshwanedje.com/publications/BantuPOS.pdf | |
dc.identifier.uri | http://hdl.handle.net/11295/37634 | |
dc.description.abstract | Recent scientific publications on data-driven part-of-speech tagging of Sub-Saharan African languages have reported encouraging accuracy scores, using off-the-shelf tools and often fairly limited amounts of training data. Unfortunately, no research efforts exist that explore
which type of linguistic features contribute to accurate part-of-speech tagging for the languages under investigation. This paper describes
feature selection experiments with a memory-based tagger, as well as a resource-light alternative approach. Experimental results show
that contextual information is often not strictly necessary to achieve a good accuracy for tagging Bantu languages and that decent results
can be achieved using a very straightforward unigram approach, based on orthographic features | en |
dc.language.iso | en | en |
dc.title | Resource-Light Bantu Part-of-Speech Tagging | en |
dc.type | Presentation | en |
local.publisher | CLiPS - Computational Linguistics Group University of Antwerp, Belgium | en |
local.publisher | School of Computing and Informatics, University of Nairobi | en |