Automatic construction of a Kiswahili corpus from the World Wide Web

Miriti, Evans Ak

dc.contributor.author	Miriti, Evans Ak
dc.date.accessioned	2013-07-24T08:16:36Z
dc.date.available	2013-07-24T08:16:36Z
dc.date.issued	2005
dc.identifier.citation	K, G, E. M. 2005. Automatic construction of a Kiswahili corpus from the World Wide Web. SPECIAL TOPICS IN COMPUTING AND ICT RESEARCH: Measuring Computing Research Excellence and Vitality. , Kampala: Fountain Publishers	en
dc.identifier.uri	http://profiles.uonbi.ac.ke/eamiriti/publications/automatic-construction-kiswahili-corpus-world-wide-web
dc.identifier.uri	http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/50524
dc.description.abstract	A corpus is a large collection of language data either in written form or spoken form or both. It can be used to construct a language model that is used in many language technology applications. Some of these include speech to text, optical character recognition, machine translation and spell checking. The easiest way to create a text corpus is by putting together electronic text documents. For most languages, getting a huge collection of electronic texts is a time-consuming and challenging task. The monotonous nature of such a task will inevitably lead to much less attention being paid to the errors that might find their way into the text collection. This paper describes the working of an application that was used to build a Kiswahili corpus from the Internet to be used in natural language processing applications.
dc.language.iso	en	en
dc.title	Automatic construction of a Kiswahili corpus from the World Wide Web	en
dc.type	Article	en
local.publisher	Centre For Biotechnology & Bioinformatics Publications	en

Files in this item

Name:: Miriti_Automatic creation of a ...
Size:: 218.5Kb
Format:: PDF
Description:: Fulltext.pdf

View/Open

This item appears in the following Collection(s)

Faculty of Science & Technology (FST) [4283]

Show simple item record