Show simple item record

dc.contributor.authorNgoni, Velma N
dc.date.accessioned2022-11-02T11:22:55Z
dc.date.available2022-11-02T11:22:55Z
dc.date.issued2022
dc.identifier.urihttp://erepository.uonbi.ac.ke/handle/11295/161610
dc.description.abstractIt is important for human beings to communicate globally, and due to difference in languages, a translator is vital for effective use of digital services. The international marketers for example, use about ten languages, Kiswahili excluded, despite it being a national language in most of the Sub-Saharan African countries. According to The Cambridge Encyclopedia of the English Language, an estimate of 9% of Kenyans speak in English which is a major international language. The other 91% of Kenyans who don’t speak in English either speak in Swahili or their tribal languages hence are excluded digitally. Even though the vernacular languages are spoken by approximately fifty million people in Kenya, they are resource-scarce from a language technological point. Machine translation models for these low-resourced African languages are scarce causing a lack of digital inclusion for many Africans. An English-Indigenous Language translator thus should be designed for digital inclusion of these non-English speaking communities. In this study, exploratory methodology was used to develop the English to Luhya machine translation prototype. Exploratory research is a methodology approach that investigates research questions that have not previously been studied in-depth. The model was successfully developed using Encoder – Decoder. It had a hidden layer size 128 and embeddings had 256 units. The training run for 50 epochs in batches of 100. The ADAM optimizer was used with a constant learning rate of 0.0005 to update the model weighs. The model was evaluated using BLEU score as the main evaluation metric and WER, SER, TER complementing the results. The model scored a highest BLEU score of 0.55, just 0.05 shy off the median range of 0.6 – 0.7 that has been achieved by other researchers. Compared to similar research on low-resourced languages, it scored modestly but outperformed translation of English to Kiswahili (0.20) using Statistical Machine Translation. For future work, the key initiative is creation of publicly available corpus which will serve as a catalyst to for research in this area. This limitation can benefit from having audio resources through speech recognition and speech-to-text implementation since Bukusu is primarily spoken and the lack of standardization in writing complicates the creation of clean reference sets and consistent evaluation. This study could also benefit from having reference models for Bukusu Named Entity Recognition and Parts of Speech tagging to improve translation accuracy. Since Bukusu language structure is like Swahili, key focus should first be in developing open-source NLP tools for Swahili language. With this, researchers Bukusu and other low resourced in other East Africa Bantu languages can be able to transfer the Swahili models and annotations to their respective languages.en_US
dc.language.isoenen_US
dc.publisheruniversity of nairobien_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.titleEnglish – Bukusu Automatic Machine Translation for Digital Services Inclusion in E-governanceen_US
dc.typeThesisen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States