dc.description.abstract | Large amount of data is often required to train and deploy useful machine learning models in
industry. Smaller enterprises do not have the luxury of accessing enough data for machine
learning, For privacy sensitive fields such as banking, insurance and healthcare, aggregating data
to a data warehouse poses a challenge of data security and limited computational resources.
These challenges are critical when developing machine learning algorithms in industry. Several
attempts have been made to address the above challenges by using distributed learning
techniques such as federated learning over disparate data stores in order to circumvent the need
for centralised data aggregation.
This paper proposes an improved algorithm to securely train deep neural networks over several
data sources in a distributed way, in order to eliminate the need to centrally aggregate the data
and the need to share the data thus preserving privacy. The proposed method allows training of
deep neural networks using data from multiple de-linked nodes in a distributed environment and
to secure the representation shared during training. Only a representation of the trained models
(network architecture and weights) are shared.
The algorithm was evaluated on existing healthcare patients data and the performance of this
implementation was compared to that of a regular deep neural network trained on a single
centralised architecture. This algorithm will pave a way for distributed training of neural
networks on privacy sensitive applications where raw data may not be shared directly or
centrally aggregating this data in a data warehouse is not feasible.
Index Terms : Big Data, Distributed Computing, Deep Learning | en_US |