Using Transfer Learning to Leverage Large Un-labelled Datasets to Improve Classification Models in Cases With Small- Labelled Datasets: Application to Paediatric Diagnostic and Prognostic Models
Abstract
Diagnostic and prognostic models based on machine learning models can improve diagnosis and
identification of patients at risk of adverse health outcomes. Healthcare delivery can thus be
improved in low and middle income countries (LMIC) settings, where making accurate diagnosis
remains a challenge because of lack of essential laboratory tests and trained medical staff. Training
machine learning models requires large labelled datasets which are often unavailable in LMICs.
Moreover, models developed in say high-income settings/countries may not be generalizable to
LMICs because of differences in setting/context where underlying data was collected. Transfer
learning, which stores knowledge gained in solving one problem and incorporates that knowledge
while solving a different but related problem, can overcome the challenges of training machine
learning models using small labelled datasets. Transfer learning can extract knowledge from large
un-labelled datasets or dataset from a different setting and incorporate that knowledge when training
models using small labelled datasets, making it potentially applicable to settings with sparse or unlabelled
data such as in LMIC. Transfer learning has been applied to natural images and natural
language processing, but performance on healthcare data such as medical images, bio-signals and
tabular datasets (e.g. clinical signs and symptoms) has not been evaluated.
This study evaluates the use of transfer learning in improving the performance of diagnostic and
prognostic models fitted using small labelled datasets. Three types of datasets were evaluated.
Firstly, paediatric chest x-rays were classified into WHO standardized categories for diagnosis of
pneumonia. Secondly, physiological signals from a pulse oximeter were used to predict
hospitalization status, and lastly, tabular data comprising clinical signs and symptoms were used to
predict positive blood culture results (bacteremia). The performance of models fitted with and
without transfer learning were compared for each dataset.
Transfer learning approaches using multi-task learning and pre-trained models (supervised and
unsupervised pre-training) were used to leverage a large chest x-ray dataset from a high income
setting to improve performance of models trained on a small chest x-ray dataset from seven LMICs.
A novel method incorporating annotation from multiple human readers/annotators of chest x-rays is
proposed and evaluated. Self-supervised learning (SSL) methods were used to extract features from
pulse oximeter signals and to initialize end-to-end deep learning models for predicting
hospitalization status (unsupervised pre-training). Features extracted using SSL were used to predict
hospitalization using logistic regression. Finally, deep learning models for predicting bacteremia
using clinical signs and symptoms were compared with logistic regression models. The deep
learning models were either initialized randomly or using weights from auto-encoders
(unsupervised pre-training).
Supervised and unsupervised pre-training improved classification performance of chest x-rays
marginally (accuracy 0.61 vs 0.59 and 0.60 vs 0.59, respectively). Multi-task learning did not
improve classification of chest x-rays, while incorporating annotations from multiple human readers
had higher performance (accuracy 0.62 vs 0.61). Features extracted from pulse oximeter signals
using SSL models were predictive of hospitalization. The AUCs of logistic regression model
trained on features extracted using SSL models were 0.83 and 0.80 for SSL model trained using
labelled data only and SSL model trained using both labelled and unlabelled data, respectively.
End-to-end deep learning models had AUCs of 0.73 when initialized randomly, 0.77 when
initialized using SSL model trained using labelled data only, and 0.80 when initialized using both
labelled and unlabelled pulse oximeter signals. Logistic regression models for predicting positive
blood cultures performed better than deep learning for small training datasets (AUC 0.67 vs 0.62)
and marginally worse for large datasets (AUC 0.70 vs 0.71). Initializing deep learning models using
weights from auto-encoders did not have any effect on performance on models for predicting
bacteremia.
Our results suggest that transfer learning can improve performance of models trained on
homogenous data types such as medical images and bio-signals but may have no effect on a
heterogeneous tabular data. SSL can be is an effective technique for extracting features from biosignals
that could be used to predict various physiological parameters such as respiratory rate. Deep
learning models perform worse than logistic regression in predicting bacteraemia using clinical
signs and symptoms when the dataset is small.
Publisher
University of Nairobi
Rights
Attribution-NonCommercial-NoDerivs 3.0 United StatesUsage Rights
http://creativecommons.org/licenses/by-nc-nd/3.0/us/Collections
The following license files are associated with this item: