Exploratory Loan Data Analysis & Modelling Time To Default Using Survival Analysis Techniques
The financial sector in Kenya has recorded double-digit growth in profits for most of the past decade, with the loans portfolio recording the highest growth while the economic growth has averaged at about 5%. Of particular concern is that the banking sector has been growing faster than the rest of the economy and would result in institutions and households that are not able to repay their debts leading to the increase of non-performing loans and as a result banks would be required to hold higher capital buffers to absorb possible shocks considering scenario like the global financial crisis experienced in late 2008. Banks are required to set aside some amounts for the non-performing loans and this impact on the profits as it is a deductable expense. This therefore means that loans portfolio should be effectively managed to ensure that credit risk is at manageable levels. Effective management of the growing portfolio requires frequent review of the credit granting process to ensure that only credit worthy individuals are granted loans. Banks have traditionally employed the use of credit scoring to differentiate 'bad' customers from 'good' customers in their credit granting process however the idea of markov chain where borrowers' move from one credit state to another brings to light that borrower's status is dynamic and not static. Credit scoring puts a static element to this dynamism and the study focus now is not if but when will the borrowers default. With this identified dynamicity, lending institutions need to review their credit granting criteria to be robust so that they not only score for risk but for profitability. This would ensure they choose customers whose time to default is long hence resulting in maximized profits since interest charged will compensate or even exceed losses resulting from default. This paper therefore explores the loan data and uses survival analysis techniques specifically the Kaplan Meier and Cox Proportional Hazard Model approach to model time to default using various borrowers' application characteristics that include gender, age, income, term of loan, income commitment and banking history. Both the log rankand Wilcoxon tests are used to assess whether there is difference in the survival curves of the categorical variables. The explanatory variables found to be significant in the univariate analysis are then assessed for time dependency and a multivariate Cox PH model with time independent covariate fitted. The results showed that out of the 6 application variables, only income and banking history were significant. It was therefore 156/69727/201 j iv not meaningful to classify borrowers on the basis of their gender, age, term of loan and commitments to the bank as these application variables did not affect risk of default. As customers move from low income « KES 100,000) to high income (2: KES 300,000), rate of default decreases by 51%, when all other variables are held constant. Customers with banking history <6 months experienced default that is 2.3 times higher than those who have banked >24 months. Customers with banking history of 6-12 months have a default rate that is 96% higher than those who have banked> 24 months.