Comparison of elastic net and random forest in identifying risk factors of stunting in children under five years of age in Kenya
Abstract
Background: Children with a Height-for-Age (HAZ) below-2 Standards Deviations based
on the World Health Organization (WHO) child growth standards median are said to be
stunted. Most stunted children are too short for their age. Stunting is determined by calculating
the number of under- ve children whose z-score is below -2 SDs from the median
HAZ of the WHO child growth standards divided by total number of under ve children
who are measured. According to Kenya Demographic Survey (KDHS, 2014), the national
prevalence of stunting among the under- ve children was 26% which was relatively higher
than the average prevalence of developing countries which is 25%.
Objective: This work compares Random Forest and Elastic Net in identifying determinants
of under ve childhood stunting with Variable Importance as the key outcome.
Methods: The Kenya Demographic Health Survey (KDHS) women and children data was
used for analysis. This data was cleaned using STATA and analyzed with R software. Due
to the variance in the classes of the response variable, Synthetic Minority Oversampling
Technique (SMOTE) was employed to obtain a balanced class data. Missing observations
were imputed using r mpute function from library randomForest in R software. Random
Forest and Elastic Net algorithms were used to obtain determinants of stunting while Area
Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve was used
to compare the models.
Results: The top 5 factors in terms of importance according to Random Forest are: underweight
status, region, child’s age, ethnicity, and mother’s current age. According to
the Elastic Net algorithm, the top 5 important coe cient variables are: underweight children,
Nairobi region, 60+ months preceding birth interval, 12-23 months old children, and
children from Luhya ethnicity. In terms of the ROC values, Random Forest had an AUC
of 0.92 while Elastic Net had an AUC of 0.86.
Conclusion: Based on our ndings, most of the top ranked important variables selected
by Random Forest and Elastic Net are similar. Nevertheless, Random Forest performed
better than the Elastic Net algorithm in determining the factors of under ve childhood
stunting.
Keywords: Stunting, Random Forest, Elastic Net, Variable Importance, Gini Index, Area
Under the Curve (AUC), Receiver Operating Characteristic Curve (ROC), Missing values
Publisher
University of Nairobi
Rights
Attribution-NonCommercial-NoDerivs 3.0 United StatesUsage Rights
http://creativecommons.org/licenses/by-nc-nd/3.0/us/Collections
The following license files are associated with this item: