Towards judicial data warehousing and data mining
Institutions have often adopted the use of Extraction, Transformation and Loading (ETL) of data into a data warehouse. A data warehouse is a central repository of information which can be retrieved later for analytics or other data mining related activities. ETL is a process that is used to take information from one or more sources, normalize it in some way to some convenient schema, and then insert it into some other repository for future use. This is usually achieved using automated tools in order to make this process easier and efficient. The choice of the ETL tool to use may be dictated by many factors among them the DBMS in use, compatibility with existing data source and hardware infrastructure among other factors. Even thou there are major efforts in development of databases for use in various departments, currently no existing data warehouse in the Kenya Judiciary. This makes it challenging to mine data. Thus, each department with such initiative operates independently with their preferred relational database that suits only the purpose for the particular department. In other cases, some information is stored in different formats and data types or as flat files. There is need to create a central repository for data from the various departments. However, this would call for major preparation for the data collected as well as the infrastructure. One way to make preparations is to adopt the ETL process which utilizes ETL tools to carry out activities such as cleansing, customization, reformatting, integration, and insertion of data into a data warehouse. The Kenya judiciary could benefit from this process where all data collected from operational databases is collected using standardized data collection tools, cleansed and inserted into a data warehouse. The main objective of this study is to demonstrate how the Judiciary would benefit from ETL process to migrate data from a sample source system, make changes to the data and load into a specified destination ready for further exploration or analysis using other data mining tools. This research will adopt a descriptive research design as it entails systematic collection of information and careful selection of the units to be studied as well as careful measurement of each variable or unit of data collected either as entities, fields or attributes of data. In this research data will be collected using document analysis and interviews. Presentation of results will be done by use of tables, Csv and xls files.