dc.description.abstract | Data mining technologies have been used extensively in the
commercial retail sectors to extract data from their “big data”
warehouses. In healthcare, data mining has been used as well in
various aspects which we explore. The voluminous amounts of
data generated by medical systems form a good basis for
discovery of interesting patterns that may aid decision making
and saving of lives not to mention reduction of costs in research
work and possibly reduced morbidity prevalence. It is from this
that we set out to implement a concept using association rule
mining technology to find out any possible diagnostic
associations that may have arisen in patients’ medical records
spanning across multiple contacts of care. The dataset was
obtained from Practice Fusion’s open research data that
contained over 98,000 patient clinic visits from all American
states.
Using an implementation of the classical apriori algorithm, we
were able to mine for patterns arising from medical diagnosis
data. The diagnosis data was based on ICD-9 coding and this
helped limit the set of possible diagnostic groups for the
analysis. We then subjected the results to domain expert
opinion. The panel of experts validated some of the most
common associations that had a minimum confidence level of
between 56-76% with a concurrence rate of 90% whereas others
elicited debate amongst the medical practitioners. The results of
our research showed that association rule mining can not only
be used to confirm what is already known from health data in
form of comorbidity patterns, but also generate some very
interesting disease diagnosis associations that can provide a
good starting point and room for further exploration through
studies by medical researchers to explain the patterns that are
seemingly unknown or peculiar in the concerned populations.
implementations, but to also query, analyze and extract useful
statistics from data entered in the same systems.
The need to have EMR systems has been influenced by some
factors including complex medical data, the influx of patients
and the need to have proper recording of health data. When
EMR systems are well developed, they are likely to positively
impact the quality and reliability of health data, as well as
standardized reporting[1].
The standards that will be of particular interest in our research
are the International Classification of Diseases (ICD) standards,
(both ICD-9 and ICD-10) and HL7 health information
interchange standards.
In their work, Fast algorithms for mining association rules in
large databases, [2], the authors presented an algorithm,
known as Apriori, for discovering association rules within
large, primarily transactional, sales databases. This algorithm
was a development of previously known algorithms for itemset
mining and association rules discovery. We have a brief look at
how this algorithm works and its known uses in the
commercial, particularly retail sales databases, for which the
authors admit the algorithm was originally conceived. We will
also explore the benefits accrued by using this algorithm over
other known algorithms for association rules mining.
The availability of standardized medical data creates a large
pool of data with a lot of hidden and potentially useful
information. Using association rule mining and the apriori
algorithm in particular, we seek to unravel the hidden diagnosis
patterns that could be present within the data availed by these
systems. We also intend to generate and discover strong rules
(relationships) that indicate multimorbidity trends from the
EMR data with varying measures of interestingness. | |