Show simple item record

dc.contributor.authorYahaya, Mahama
dc.contributor.authorGuo, Runhua
dc.contributor.authorJiang, Xinguo
dc.contributor.authorBashir, Kamal
dc.contributor.authorMatara, Caroline
dc.contributor.authorXu, Shiwei
dc.date.accessioned2021-06-22T05:46:41Z
dc.date.available2021-06-22T05:46:41Z
dc.date.issued2021-03
dc.identifier.citationYahaya M, Guo R, Jiang X, Bashir K, Matara C, Xu S. Ensemble-based model selection for imbalanced data to investigate the contributing factors to multiple fatality road crashes in Ghana. Accid Anal Prev. 2021 Mar;151:105851. doi: 10.1016/j.aap.2020.105851. Epub 2020 Dec 28. PMID: 33383521.en_US
dc.identifier.urihttps://pubmed.ncbi.nlm.nih.gov/33383521/
dc.identifier.urihttp://erepository.uonbi.ac.ke/handle/11295/155044
dc.description.abstractThe study aims to identify relevant variables to improve the prediction performance of the crash injury severity (CIS) classification model. Unfortunately, the CIS database is invariably characterized by the class imbalance. For instance, the samples of multiple fatal injury (MFI) severity class are typically rare as opposed to other classes. The imbalance phenomenon may introduce a prediction bias in favour of the majority class and affect the quality of the learning algorithm. The paper proposes an ensemble-based variable ranking scheme that incorporates the data resampling. At the data pre-processing level, majority weighted minority oversampling (MWMOTE) is employed to treat the imbalanced training data. Ensemble of classifiers induced from the balanced data is used to evaluate and rank the individual variables according to their importance to the injury severity prediction. The relevant variables selected are then applied to the balanced data to form a training set for the CIS classification modelling. An empirical comparison is conducted through considering the variable ranking by: 1) the learning of single inductive algorithm with imbalanced data where the relevant variables are applied to the imbalanced data to form the training data; 2) the learning of single inductive algorithm with MWMOTE data and the relevant variables identified are applied to the balanced data to form the training data; and 3) the learning of ensembles with imbalanced data where the relevant variables identified are applied to the imbalanced data to form the training data. Bayesian Networks (BNs) classifiers are then developed for each ranking method, where nested subsets of the top ranked variables are adopted. The model predictions are captured in four performance indicators in the comparative study. Based on three-year (2014-2016) crash data in Ghana, the empirical results show that the proposed method is effective to identify the most prolific predictors of the CIS level. Finally, based on the inference results of BNs developed on the best subset, the study offers the most probable explanations to the occurrence of MFI crashes in Ghana.en_US
dc.language.isoenen_US
dc.publisherUniversity of Nairobien_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectClass imbalance; Classification; Ensemble classifiers; Model selection; Multiple fatal injury crash; Oversampling.en_US
dc.titleEnsemble-based model selection for imbalanced data to investigate the contributing factors to multiple fatality road crashes in Ghanaen_US
dc.typeArticleen_US


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States