Use of CART and logistic regression analysis to identify key determinants of pregnancy wastage
This master research project compares the performance of Classification and Regression Trees (CART) and Logistic Regression in studying determinants of pregnancy wastage using pregnancy information from a population-based sample survey, The Kenya Demographic and Health Survey 2008/2009.The project report also describes in detail the fundamental principles of tree construction, splitting algorithms and pruning procedures. It also briefly introduces the logistic regression and then shows the comparisons of the analysis results from the two statistical methods using Receiver Operating Curve, Variable Importance and Hosmer-LemeshowModel Goodness of Fit Tests. Logistic regression performed slightly better than CART using AUC with both agreeing on age of the woman as the most important determinant of pregnancy wastage. CART found that the age of the woman, highest level of educational attainment, age at first birth, Type of place of residence being either ourban or rural and birth order to be the most important determinants of pregnancy wastage. Logistic regression analysis found out that Age of the woman, marriage to first birth interval, usage of anti-malarial during pregnancy, type of place of residence and usage of iron supplementation during pregnancy to be the most important determinants. The Hosmer-Lemeshow Goodness of Fit Test showed that CART didn’t fit the well the data while the Hosmer-Lemeshow Goodness of Fit Test for logistic regression showed that did fit the data well.The lack of close fit for the data could be explained by the nature of data and this needs further investigation comparing fits both population based data and obstetric data. However, CART results could be used for selection of key variables to be used in logistic regression analysis. When applied prudently, both CART and logistic regression are suitable for the analysis of the determinants of pregnancy wastage.