A mapreduce tool for data mining and data optimization: case of teachers' web portal
Abstract
The problem of limited resources in computing and related data management and processing operations is a
paramount challenge that is affecting the functions of Ministry of education and in general the Government of
Kenya core functions and will remain for a little longer as a catastrophic phenomenon if not considered as a
priority concern now.
MapReduce programming technique in cluster computing was studied and whose primary advantage was that, it
allows automatic parallelization of applications written in a functional programming style. This allows
researcher with no specific knowledge of parallel programming to attain parallelism in distributed cluster
environment. Various optimization techniques are considered during the design stage of the system to make it
highly efficient on shared memory system architectures, that is, cluster computing environment.
As a result, a MapReduce tool was formulated, designed and developed which consisted of two core sections,
the API for MapReduce programming environment and MapReduce runtime system entirely implemented on
Hadoop Distributed File system (HDFS). The implementation of the MapReduce runtime system was
specifically tailored for shared memory multi-core systems: Case for teachers' web portal as a source of data.
The MapReduce tool provided a considerable performance on any multi-core architecture where large dataset on
distributed cluster computing is operational. The research revealed that in making keen evaluations, the
MapReduce tool improved the high performance in data optimization as the numbers of computing nodes and
data size increases hence scalability of cluster computing is considerably improved.
Researcher's evaluation concluded that the tool can improve on the general security of the data in the system
since all data is replicated on nodes hence making them available at all times across the system.
It is therefore envisioned that the study will be of a considerable benefit to the strategist and policy makers in
formulations of policies for effective implementation where technology resources are utilized by integrating
various new and existing technologies in cluster computing for resource optimizations.
Citation
Masters of science in computer scienceSponsorhip
University of NairobiPublisher
University of Nairobi School of Computing and Informatics