Show simple item record

dc.contributor.authorGithiari, Lawrence M
dc.date.accessioned2014-09-03T06:06:49Z
dc.date.available2014-09-03T06:06:49Z
dc.date.issued2014
dc.identifier.citationSchool of Computing and Informatics,en_US
dc.identifier.urihttp://hdl.handle.net/11295/74001
dc.description.abstractMany applications that are accessed by non-technical or casual users, who prefer the use of natural language, rely on relational databases. Examples of such applications include government data repositories such as government tender information portals or application specific databases such as agricultural support systems. The problem of natural language (NL) processing for database access which has remained an unresolved issue forms the main problem addressed in this work. The specific challenges include lack of a language- and domain-independent methodology for understanding un-restrained NL text that accesses monolingual of cross-lingual databases as well as concepts extraction from database schema. It is demonstrated that an ontology based approach is technically feasible to handle some of the challenges facing NL query processing for database access. The Ontology Concept Modelling (OCM) approach relies on the ability to convert databases to ontologies from which we obtain the underlying concepts. The database concepts are matched against the concepts obtained from natural language queries using a semantically-augmented Levenshtein distance algorithm. This thesis presents the architecture and the associated algorithms for an OCM-based model for NL access to databases. In order to evaluate and benchmark the OCM model, data was generated from a prototype based on the developed OCM-based model. Quantitative parameters such as accuracy, precision, recall and the F-score and qualitative measures such as domain-independence, language independence, support for cross-lingual querying and the effect of query complexity on the model were evaluated across five data sets. Studies were conducted for English, Kiswahili and English-Kiswahili pair of languages in a cross-lingual manner from which attainment of language and domain independence for database access are demonstrated. For this language pair, it is also shown empirically that it is adequate to incorporate a bilingual dictionary at gazetteer level for cross-lingual data retrieval. To evaluate the performance of the developed OCM-model, test-beds comprising of monolingual, cross-lingual as well as cross-domain performance measurements capacity were designed to test various aspects of the model. Tests were then conducted and the results indicated that OCM has a marginally better precision of 0.861 compared to other benchmarking models selected for comparison. Further OCM has an average F-score of 0.78 which compares well to other bench-marking models. The main contribution of this work especially on the OCM architecture, processing algorithms such as OWoRA (Ontology Words Recovery Algorithm) and Frameworks such as QuSeT (Query semantics transfer framework) and evaluation models have a huge significance to the research and developer communities as they provide novel approaches to NL database access and model evaluation techniques. Keywords: Natural Language Query, Database Access, Ontology Concept Modelingen_US
dc.language.isoenen_US
dc.publisherUniversity of Nairobien_US
dc.titleNatural language access to relational databases:an ontology concept mapping (OCM) approachen_US
dc.typeThesisen_US
dc.type.materialen_USen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record