dc.description.abstract | Despite rapid development in information technologies, a practical way of mapping graduates‘ skills to
industry roles is a challenge. Attempts have been made by posing this as a multi-classification problem and
solving using machine learning techniques. However, existing approaches seem not to embrace attributes and
machine learning structures relevant to the problem, and hence, their results may not be reliable. For example,
although occupational industry roles in the organizations are structured hierarchically, many studies have
approached this problem using flat instead of hierarchical methods. Either relevant attributes or hierarchical
structure that correctly reflects hierarchy of industry roles, or both, are unknown for an effective model for
mapping graduates‘ skills to industry roles.
Currently, hierarchical method has not been applied in skills mapping to industry roles despite its many
benefits vis-à-vis flat method. However, in other areas where it has been used, classification approach
contradicts underlying structure of the problem thus resulting in multiple label prediction problems. As a
result, this study presents an investigation that posed skills mapping to industry roles as a hierarchically
structured multiclass problem where a machine learning structure that correctly reflects the hierarchy of
industry roles was applied. The aim being to demonstrate using a case how to build a machine learning model
for mapping graduates‘ skills to hierarchically structured industry roles. This was achieved by establishing
both underlying structural characteristic of industry roles, as concepts required for target classes, that correctly
reflects the hierarchy of industry roles and concepts appropriate as attributes for hierarchical machine learning
purpose, before building and evaluating the mapping model. The model is based on the underlying taxonomic
structure whose basic approach is to correctly reflect the hierarchical structure of industry roles. Literature
analysis of three theoretical frameworks provided a basis for establishing appropriate attributes for machine
learning investigation after which hierarchical classification strategy was designed to generate the model
before its prototype was constructed. Experimental design was adopted using four machine learning
techniques (Logistic Regression, K-Nearest Neighbor, SVM, and Naïve Bayes). A benchmark dataset and 113
Software Engineering employees‘ skills profile data collected using stratified random sampling from various
software development firms in Nairobi were involved in the investigation. Experiments to evaluate
performance and validity of the model were designed using repeated 5-fold cross validation procedure.
Performance reported on carefully selected benchmarks on multi-classification method was adopted for
validation of results.
Findings revealed five appropriate attributes for building a model for mapping skills to industry roles and the
best model was SVM induced with an average generalization performance accuracy of 67% across three
datasets. On benchmark dataset, our model registered performance accuracy of 85% better than 82% reported
by a selected benchmark on similar dataset. These results seem to be fairly consistent with results achieved by
similar hierarchical models as reported in other problem domains such as proteins (53.3%) and music (61%).
In conclusion, the research objective was fulfilled with the following contributions, namely conceptual model,
ML architecture for the model, software prototype, hierarchical mapping framework, research findings,
datasets and literature survey which will benefit researchers in general (students, universities and industry)
and specially the government in developing an effective policy for training evaluation that ensures graduates
are relevant to the industry. | en_US |