Deep learning methods are showing great promise for advancing cancer research and could potentially improve clinical decision making in cancers such as primary brain glioma, where deep learning models have recently shown promising results in predicting isocitrate dehydrogenase (IDH) mutation and survival in these patients. A major challenge thwarting this research, however, is the requirement for large quantities of labeled image data to train deep learning models. Efforts to create large public centralized collections of image data are hindered by barriers to data sharing, costs of image de-identification, patient privacy concerns, and control over how data are used. Current deep learning models that are being built using data from one or a few institutions are limited by potential overfitting and poor generalizability. Instead of centralizing or sharing patient images, we aim to distribute the training of deep learning models across institutions with computations performed on their local image data. Although our preliminary results demonstrate the feasibility of this approach, there are three key challenges to translating these methods into research practice: (1) data is heterogeneous among institutions in the amount and quality of data that could impair the distributed computations, (2) there are data security and privacy concerns, and (3) there are no software packages that implement distributed deep learning with medical images. We tackle these challenges by (1) optimizing and expanding our current methods of distributed deep learning to tackle challenges of data variability and data privacy/security, (2) creating a freely available software system for building deep learning models on multi- institutional data using distributed computation, and (3) evaluating our system to tackle deep learning problems in example use cases of classification and clinical prediction in primary brain cancer. Our approach is innovative in developing distributed deep learning methods that will address variations in data among different institutions, that protect patient privacy during distributed computations, and that enable sites to discover pertinent datasets and participate in creating deep learning models. Our work will be significant and impactful by overcoming critical hurdles that researchers face in tapping into multi-institutional patient data to create deep learning models on large collections of image data that are more representative of disease than data acquired from a single institution, while avoiding the hurdles to inter-institutional sharing of patient data. Ultimately, our methods will enable researchers to collaboratively develop more generalizable deep learning applications to advance cancer care by unlocking access to and leveraging huge amounts of multi-institutional image data. Although our clinical use case in developing this technology is primary brain cancer, our methods will generalize to all cancers, as well as to other types of data besides images for use in creating deep learning models, and will ultimately lead to robust deep learning applications that are expected to improve clinical care and outcomes in many types of cancer.
We develop technology that will enable researchers to tap into the enormous amount of imaging data in multiple institutions to create deep learning models for cancer applications without requiring sharing of patient data. Our work will thus enable development of more robust deep learning models to improve clinical decision making in cancer than models currently built on data from single institutions. Although our focus is improving decision making in the primary brain cancer, our methods and tools are generalizable and will be broadly applicable to all cancers, with the potential for improvement in clinical care and patient health.