Extracting diagnosis patterns in electronic medical records using association rule mining
Kang'ethe, Stephen M
Wagacha, Peter W
MetadataShow full item record
Data mining technologies have been used extensively in the commercial retail sectors to extract data from their “big data” warehouses. In healthcare, data mining has been used as well in various aspects which we explore. The voluminous amounts of data generated by medical systems form a good basis for discovery of interesting patterns that may aid decision making and saving of lives not to mention reduction of costs in research work and possibly reduced morbidity prevalence. It is from this that we set out to implement a concept using association rule mining technology to find out any possible diagnostic associations that may have arisen in patients’ medical records spanning across multiple contacts of care. The dataset was obtained from Practice Fusion’s open research data that contained over 98,000 patient clinic visits from all American states. Using an implementation of the classical apriori algorithm, we were able to mine for patterns arising from medical diagnosis data. The diagnosis data was based on ICD-9 coding and this helped limit the set of possible diagnostic groups for the analysis. We then subjected the results to domain expert opinion. The panel of experts validated some of the most common associations that had a minimum confidence level of between 56-76% with a concurrence rate of 90% whereas others elicited debate amongst the medical practitioners. The results of our research showed that association rule mining can not only be used to confirm what is already known from health data in form of comorbidity patterns, but also generate some very interesting disease diagnosis associations that can provide a good starting point and room for further exploration through studies by medical researchers to explain the patterns that are seemingly unknown or peculiar in the concerned populations. implementations, but to also query, analyze and extract useful statistics from data entered in the same systems. The need to have EMR systems has been influenced by some factors including complex medical data, the influx of patients and the need to have proper recording of health data. When EMR systems are well developed, they are likely to positively impact the quality and reliability of health data, as well as standardized reporting. The standards that will be of particular interest in our research are the International Classification of Diseases (ICD) standards, (both ICD-9 and ICD-10) and HL7 health information interchange standards. In their work, Fast algorithms for mining association rules in large databases, , the authors presented an algorithm, known as Apriori, for discovering association rules within large, primarily transactional, sales databases. This algorithm was a development of previously known algorithms for itemset mining and association rules discovery. We have a brief look at how this algorithm works and its known uses in the commercial, particularly retail sales databases, for which the authors admit the algorithm was originally conceived. We will also explore the benefits accrued by using this algorithm over other known algorithms for association rules mining. The availability of standardized medical data creates a large pool of data with a lot of hidden and potentially useful information. Using association rule mining and the apriori algorithm in particular, we seek to unravel the hidden diagnosis patterns that could be present within the data availed by these systems. We also intend to generate and discover strong rules (relationships) that indicate multimorbidity trends from the EMR data with varying measures of interestingness.