dc.description.abstract | Routine health facility data is collected using health information systems. In Kenya, it’s
collected using the District Health Information System (DHIS2). This data is continuously
collected and cheaper to obtain compared to surveys. Currently, there has been increased
advocacy for using this data by governments and development organizations such as the
World Health Organization (WHO). Currently, it is unclear about how much DHIS2 data
one needs to estimate indicators. All the studies that have used routine data use all the
available reports to obtain estimates. This study proposes a novel sub-sampling approach
to the estimation of indicators from routine data. The null-hypothesis the study set out
was that smaller subsamples of routine data provide credible estimates.
Data from 1,808 health facilities in Western Kenya is obtained from DHIS2. Information
of 5 data elements, the number of DPT1 doses, the number of DPT3 doses, the number of
LLITNs distributed to pregnant mothers attending ANC, the number of pregnant women
completing at least 4 ANC visits, and the number of pregnant women completing the rst
ANC visits are used to compute three indicators. The three indicators that were calculated
from the 5 data elements are; the coverage of the third dose of pentavalent vaccine
(DPT3), the proportion of pregnant women who receive LLINs, and the proportion of
pregnant women who completed at least 4 ANC visits. The study then uses both spatial
and non-spatial sampling to obtain proportions of data from the entire dataset and compute
estimates. The proportions were 90%,80%,70%,60%,50%,40%,30%, and 20%. Spatial
sampling was used because of the indicators of interest exhibit some spatial variability.
The study then used a z-test to determine whether a signi cant di erence exists between
the subsample estimates and the population estimates. We also used power calculations
to determine the statistical power each subsample had.
The results from the study indicate that there was no signi cant di erence between the
population estimate and sub-sample estimates after using both spatial and non spatial
sampling (all p-values > 0.05). This implies that one doesn’t need the whole data set to
obtain estimates from DHIS2, and the sampling design doesn’t matter unless the indicator
of interest exhibits some spatial variation. However, based on the con dence intervals, we
found that larger samples had narrower con dence intervals, so we recommend sampling
above 60%. The power calculation also supported this conclusion. We found that although
the power of the subsamples to obtain estimates was generally high (> 70%), it reduced as
the sample size reduced. | en_US |