Imputation Techniques In Multivariate Analysis
Abstract
This research deals with the problem of missing data in multivariate
analysis, in the sense that not all variables of interest are measured on every
unit or element of the sample. The emphasis of the thesis is on imputation
techniques as a method of handling missing data problem in multivariate
analysis. Special attention is paid to the method of Buck (1960) as a pioneering
imputation method for estimating the covariance matrix of any
k-variate population in the presence of missing values.
We have extended Buck's method to the case of units with more than
one missing value and obtained the properties of the resulting estimators.
A simplified procedure for the estimation of the bias of the variances of the
observed and imputed data has also been developed. On the basis of the
simplified procedure, a functional relationship between the relative bias and
the coefficient of determination has been established. It has also been shown
that for some patterns of missingness, Buck's method makes maximum use
of the available information.
The problems caused by imputation via Buck's method in regression
analysis are studied. It has. been shown that the presence of the imputed
values create serious biases in the obtained estimates.
For the case of the model-based strategy it has been shown that the
factorization method of Anderson (1957) is equivalent to the special case of
Buck's method where units have one missing value subject to one variable.
We have also shown that this equivalence of the two methods does not hold
for the case of units with more than one missing value. It is also shown that,
under normality assumptions, the EM algorithm is equivalent to an iterated
version of Buck's method.
Finally we have made an attempt to lay a foundation for extending
Buck's method to handle non-randomly missing data. With that in mind,
the work of Nordheim (1978, 1984) has been extended by considering the
case of non-random misclassification.
Throughout the thesis numerical illustrations and validation of the obtained
theoretical results are given using real data. The data are analyzed
using SPSS and STATGRAPHICS statistical computer softwares.
Citation
Doctor of Philosophy in MathematicsPublisher
University of Nairobi School of Mathematics