In memory data warehousing to improve on data analytics latency.
Mburu, John K
MetadataShow full item record
Most of the banks in the Kenyan banking industry are still using traditional reporting methods that involves querying data from different transactional databases that are not interconnected. Some business teams download reports from core transaction processing systems (TPS). The data is then subjected to tools such as Microsoft excel to analyze the raw data and use of various excel functions to report what the top business executives need. Every time these executives request for a report, the reporting team, if any, must put a request to the technical information technology teams to generate the raw data with the requested fields and forward it to possibly some shared folder where the reporting team can access in preparation for data analysis and reporting. This is time consuming, prone to errors and it requires the availability of the technical teams to ensure the process is complete. Some of the results that are produced using such traditional methods are not accurate and are at times wrong. There is also no way to combine results from all systems to get a complete view of a concept (a customer for example). Use of the entire databases for reporting instead of some few attributes of the database in form of a modelled fact/dimension means it takes more time to process or mine data. The research was aimed at establishing if this problem could be solved by replacing the traditional reporting mechanisms with a modern in-memory data warehousing prototype. This involved development of various data analytics components that created a new set of processes of data collection, cleaning/transformation and analysis. Apache Spark in-memory solution was used to store more summarized data by organizing data into dimension and measures for easier and faster retrieval. At the presentation layer, Apache Zeppelin was used to showcase data visualization. Sampling method was based on purposive sampling technique by studying the way data analysis is done by a selected banking analytics professionals. The research was done using the case study approach where data was collected from the business reporting, technical and management teams that were involved in the reporting process. The prototype was evaluated to show what improvements were realized using the new in-memory system by using sample data that was large enough to compare with other options.
University of Nairobi
SubjectData Analytics Latency
RightsAttribution-NonCommercial-NoDerivs 3.0 United States
The following license files are associated with this item: