Using Transfer Learning to Leverage Large Un-labelled Datasets to Improve Classification Models in Cases With Small- Labelled Datasets: Application to Paediatric Diagnostic and Prognostic Models

Mwaniki, Paul M.

View/Open

Full text (2.708Mb)

Date

2023

Author

Mwaniki, Paul M.

Type

Thesis

Language

Metadata

Show full item record

Abstract

Diagnostic and prognostic models based on machine learning models can improve diagnosis and identification of patients at risk of adverse health outcomes. Healthcare delivery can thus be improved in low and middle income countries (LMIC) settings, where making accurate diagnosis remains a challenge because of lack of essential laboratory tests and trained medical staff. Training machine learning models requires large labelled datasets which are often unavailable in LMICs. Moreover, models developed in say high-income settings/countries may not be generalizable to LMICs because of differences in setting/context where underlying data was collected. Transfer learning, which stores knowledge gained in solving one problem and incorporates that knowledge while solving a different but related problem, can overcome the challenges of training machine learning models using small labelled datasets. Transfer learning can extract knowledge from large un-labelled datasets or dataset from a different setting and incorporate that knowledge when training models using small labelled datasets, making it potentially applicable to settings with sparse or unlabelled data such as in LMIC. Transfer learning has been applied to natural images and natural language processing, but performance on healthcare data such as medical images, bio-signals and tabular datasets (e.g. clinical signs and symptoms) has not been evaluated. This study evaluates the use of transfer learning in improving the performance of diagnostic and prognostic models fitted using small labelled datasets. Three types of datasets were evaluated. Firstly, paediatric chest x-rays were classified into WHO standardized categories for diagnosis of pneumonia. Secondly, physiological signals from a pulse oximeter were used to predict hospitalization status, and lastly, tabular data comprising clinical signs and symptoms were used to predict positive blood culture results (bacteremia). The performance of models fitted with and without transfer learning were compared for each dataset. Transfer learning approaches using multi-task learning and pre-trained models (supervised and unsupervised pre-training) were used to leverage a large chest x-ray dataset from a high income setting to improve performance of models trained on a small chest x-ray dataset from seven LMICs. A novel method incorporating annotation from multiple human readers/annotators of chest x-rays is proposed and evaluated. Self-supervised learning (SSL) methods were used to extract features from pulse oximeter signals and to initialize end-to-end deep learning models for predicting hospitalization status (unsupervised pre-training). Features extracted using SSL were used to predict hospitalization using logistic regression. Finally, deep learning models for predicting bacteremia using clinical signs and symptoms were compared with logistic regression models. The deep learning models were either initialized randomly or using weights from auto-encoders (unsupervised pre-training). Supervised and unsupervised pre-training improved classification performance of chest x-rays marginally (accuracy 0.61 vs 0.59 and 0.60 vs 0.59, respectively). Multi-task learning did not improve classification of chest x-rays, while incorporating annotations from multiple human readers had higher performance (accuracy 0.62 vs 0.61). Features extracted from pulse oximeter signals using SSL models were predictive of hospitalization. The AUCs of logistic regression model trained on features extracted using SSL models were 0.83 and 0.80 for SSL model trained using labelled data only and SSL model trained using both labelled and unlabelled data, respectively. End-to-end deep learning models had AUCs of 0.73 when initialized randomly, 0.77 when initialized using SSL model trained using labelled data only, and 0.80 when initialized using both labelled and unlabelled pulse oximeter signals. Logistic regression models for predicting positive blood cultures performed better than deep learning for small training datasets (AUC 0.67 vs 0.62) and marginally worse for large datasets (AUC 0.70 vs 0.71). Initializing deep learning models using weights from auto-encoders did not have any effect on performance on models for predicting bacteremia. Our results suggest that transfer learning can improve performance of models trained on homogenous data types such as medical images and bio-signals but may have no effect on a heterogeneous tabular data. SSL can be is an effective technique for extracting features from biosignals that could be used to predict various physiological parameters such as respiratory rate. Deep learning models perform worse than logistic regression in predicting bacteraemia using clinical signs and symptoms when the dataset is small.