dc.description.abstract | Many applications that are accessed by non-technical or casual users, who prefer the use of
natural language, rely on relational databases. Examples of such applications include
government data repositories such as government tender information portals or application
specific databases such as agricultural support systems. The problem of natural language
(NL) processing for database access which has remained an unresolved issue forms the main
problem addressed in this work. The specific challenges include lack of a language- and
domain-independent methodology for understanding un-restrained NL text that accesses
monolingual of cross-lingual databases as well as concepts extraction from database schema.
It is demonstrated that an ontology based approach is technically feasible to handle some of
the challenges facing NL query processing for database access. The Ontology Concept
Modelling (OCM) approach relies on the ability to convert databases to ontologies from
which we obtain the underlying concepts. The database concepts are matched against the
concepts obtained from natural language queries using a semantically-augmented
Levenshtein distance algorithm. This thesis presents the architecture and the associated
algorithms for an OCM-based model for NL access to databases.
In order to evaluate and benchmark the OCM model, data was generated from a prototype
based on the developed OCM-based model. Quantitative parameters such as accuracy,
precision, recall and the F-score and qualitative measures such as domain-independence,
language independence, support for cross-lingual querying and the effect of query
complexity on the model were evaluated across five data sets. Studies were conducted for
English, Kiswahili and English-Kiswahili pair of languages in a cross-lingual manner from
which attainment of language and domain independence for database access are
demonstrated. For this language pair, it is also shown empirically that it is adequate to
incorporate a bilingual dictionary at gazetteer level for cross-lingual data retrieval.
To evaluate the performance of the developed OCM-model, test-beds comprising of monolingual,
cross-lingual as well as cross-domain performance measurements capacity were
designed to test various aspects of the model. Tests were then conducted and the results
indicated that OCM has a marginally better precision of 0.861 compared to other benchmarking
models selected for comparison. Further OCM has an average F-score of 0.78
which compares well to other bench-marking models.
The main contribution of this work especially on the OCM architecture, processing
algorithms such as OWoRA (Ontology Words Recovery Algorithm) and Frameworks such
as QuSeT (Query semantics transfer framework) and evaluation models have a huge
significance to the research and developer communities as they provide novel approaches to
NL database access and model evaluation techniques.
Keywords: Natural Language Query, Database Access, Ontology Concept Modeling | en_US |