In memory data warehousing to improve on data analytics latency.

Mburu, John K

dc.contributor.author	Mburu, John K
dc.date.accessioned	2018-01-23T07:17:53Z
dc.date.available	2018-01-23T07:17:53Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/11295/102590
dc.description.abstract	Most of the banks in the Kenyan banking industry are still using traditional reporting methods that involves querying data from different transactional databases that are not interconnected. Some business teams download reports from core transaction processing systems (TPS). The data is then subjected to tools such as Microsoft excel to analyze the raw data and use of various excel functions to report what the top business executives need. Every time these executives request for a report, the reporting team, if any, must put a request to the technical information technology teams to generate the raw data with the requested fields and forward it to possibly some shared folder where the reporting team can access in preparation for data analysis and reporting. This is time consuming, prone to errors and it requires the availability of the technical teams to ensure the process is complete. Some of the results that are produced using such traditional methods are not accurate and are at times wrong. There is also no way to combine results from all systems to get a complete view of a concept (a customer for example). Use of the entire databases for reporting instead of some few attributes of the database in form of a modelled fact/dimension means it takes more time to process or mine data. The research was aimed at establishing if this problem could be solved by replacing the traditional reporting mechanisms with a modern in-memory data warehousing prototype. This involved development of various data analytics components that created a new set of processes of data collection, cleaning/transformation and analysis. Apache Spark in-memory solution was used to store more summarized data by organizing data into dimension and measures for easier and faster retrieval. At the presentation layer, Apache Zeppelin was used to showcase data visualization. Sampling method was based on purposive sampling technique by studying the way data analysis is done by a selected banking analytics professionals. The research was done using the case study approach where data was collected from the business reporting, technical and management teams that were involved in the reporting process. The prototype was evaluated to show what improvements were realized using the new in-memory system by using sample data that was large enough to compare with other options.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Nairobi	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Data Analytics Latency	en_US
dc.title	In memory data warehousing to improve on data analytics latency.	en_US
dc.type	Thesis	en_US
dc.description.department	a Department of Psychiatry, University of Nairobi, ; bDepartment of Mental Health, School of Medicine, Moi University, Eldoret, Kenya

Files in this item

Name:: license_rdf
Size:: 1.203Kb
Format:: application/rdf+xml

View/Open

Name:: Mburu_In Memory Data Warehousing ...
Size:: 2.145Mb
Format:: PDF
Description:: full text

View/Open

This item appears in the following Collection(s)

Faculty of Health Sciences (FHS) [4228]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States