Imputation Techniques In Multivariate Analysis
This research deals with the problem of missing data in multivariate analysis, in the sense that not all variables of interest are measured on every unit or element of the sample. The emphasis of the thesis is on imputation techniques as a method of handling missing data problem in multivariate analysis. Special attention is paid to the method of Buck (1960) as a pioneering imputation method for estimating the covariance matrix of any k-variate population in the presence of missing values. We have extended Buck's method to the case of units with more than one missing value and obtained the properties of the resulting estimators. A simplified procedure for the estimation of the bias of the variances of the observed and imputed data has also been developed. On the basis of the simplified procedure, a functional relationship between the relative bias and the coefficient of determination has been established. It has also been shown that for some patterns of missingness, Buck's method makes maximum use of the available information. The problems caused by imputation via Buck's method in regression analysis are studied. It has. been shown that the presence of the imputed values create serious biases in the obtained estimates. For the case of the model-based strategy it has been shown that the factorization method of Anderson (1957) is equivalent to the special case of Buck's method where units have one missing value subject to one variable. We have also shown that this equivalence of the two methods does not hold for the case of units with more than one missing value. It is also shown that, under normality assumptions, the EM algorithm is equivalent to an iterated version of Buck's method. Finally we have made an attempt to lay a foundation for extending Buck's method to handle non-randomly missing data. With that in mind, the work of Nordheim (1978, 1984) has been extended by considering the case of non-random misclassification. Throughout the thesis numerical illustrations and validation of the obtained theoretical results are given using real data. The data are analyzed using SPSS and STATGRAPHICS statistical computer softwares.