Feature selection methods and resampling techniques in survival data: determination of risk factors of under-five child mortality
View/ Open
Date
2021Author
Chelangat, Maureen, R
Type
ThesisLanguage
enMetadata
Show full item recordAbstract
The main aim of the research was to identify the risk factors of under-five child mortality
using Kenya Demographic Health Survey (KDHS) 2014 data set. Demographic Health
Surveys are faced with three main challenges, acute class imbalance, high dimensionality
and missing data. The KDHS 2014 data set is made up of 1129 variables and 20964
observations. In addition, the mortality class accounted for 4% of the data while the nonmortality
class accounted for the remaining 96%. To determine the risk factors, first we
dealt with the missing data by imputation. The class imbalance was handled using three
balancing methods: both sampling, under-sampling and over-sampling. We then handled
high dimensionality using three filter methods. Random survival forest was used to select
highly predictive variables and parameter estimation was done using Cox-PH model.
The variables that were found to be significant were child is twin, sex of child, births
in the last five years, currently pregnant, wanted pregnancy, living children & current
pregnancy, wanted last child, respondent slept under mosquito bed net, ideal number of
children, disposal of youngest child’s stools when not using toilet, received polio vaccine
and weight for age standard deviation.
Publisher
University of Nairobi
Rights
Attribution-NonCommercial-NoDerivs 3.0 United StatesUsage Rights
http://creativecommons.org/licenses/by-nc-nd/3.0/us/Collections
The following license files are associated with this item: