Unsupervised induction of Dholuo word classes using maximum entropy learning
View/ Open
Date
02-07-13Author
De Pauw, Guy
Wagacha, Peter W
Abade, Dorothy Atieno
Type
Working PaperLanguage
enMetadata
Show full item recordAbstract
This paper describes a proof-of-the-principle experiment in
which maximum entropy learning is used for the automatic induction of
word classes for the Western Nilotic language of Dholuo. The proposed
approach extracts shallow morphological and contextual features for each
word of a 300k text corpus of Dholuo. These features provide a layer of
linguistic abstraction that enables the extraction of general word classes.
We provide a preliminary evaluation of the proposed method in terms
of language model perplexity and through a simple case study of the
paradigm of the verb stem "somo".