Show simple item record

dc.contributor.authorRuoro, Simon Wachira
dc.date.accessioned2015-08-27T08:23:04Z
dc.date.available2015-08-27T08:23:04Z
dc.date.issued2014
dc.identifier.citationMasters of Science in Computer Scienceen_US
dc.identifier.urihttp://hdl.handle.net/11295/90185
dc.description.abstractWhen large quantities of technical texts are being translated manually, it is very difficult to produce consistent translations of recurrent stretches of text, such as paragraphs, sentences and phrases, making it not possible to reuse old translations stored as translation memories of previous versions of handbooks and thereby minimizing the chances of producing variant translations of the same source sentence that provide users with better understanding on word usage in sentences. We developed an English-Swahili example-based machine translation (EBMT) system, which exploited a bilingual corpus to find examples that match the input source-language the Translation examples were extracted from a collection of parallel and sentence aligned in English – Swahili for translation. We used the technique of splitting phrase or paragraph into sentences through the use of N-gram. In previous research, many methods used N-gram clues to split sentences. In this project, to supplement N-gram based splitting methods, we introduced another clue using sentence similarity based on edit-distance. In our splitting method, candidate sentence were generated by splitting paragraph based on N-grams, and select the best one by measuring sentence similarity. We conducted experiments using two EBMT systems, one of which use a word and the other of which use a sentence as a translation unit. Which showed that the system performs slightly better when using sentence similarity in terms of performance a considerable success rate (above 95% at sentence) was encountered in order to construct a database with truthfully correspondent units sentence. The use of words show also showed a good performance of above 65%. Also the use of classifying text into their domain/topic did show some improvement. Through the use of translation memory (TM) with repository in which the user store previously translation helping to improve translator productivity and consistency, while a TM system functions as an information retrieval system that tries to retrieve one or more suggestions from a TM database that would assist the translator in his/her current translation task or learning how a sentence can be used in different contexts or domainsen_US
dc.language.isoenen_US
dc.publisherUniversity of Nairobien_US
dc.titleA Parallel Corpus Based Translation Using Sentence Similarityen_US
dc.typeThesisen_US
dc.type.materialen_USen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record