Recent advances in multimodal brain imaging and high throughput genotyping and sequencing techniques provide exciting new opportunities to ultimately improve our understanding of brain structure and neural dynamics, their genetic architecture, and their influences on cognition and behavior. However, data privacy and security issues have inhibited data sharing across institutes. Emerging multi-site collaborative data analysis can address these issues and facilitate data and computing resource sharing. In collaborative data analysis, the participating institutes keep their own data, which are analyzed and computed locally, and only share the computed results by communicating with a server. The server communicates with all institutes and updates the local models such that the trained machine learning models indirectly use all data and are shared with all institutes. Although some distributed/parallel computation techniques were recently proposed to address big data mining problems, most of them are synchronous models. Asynchronous distributed learning methods are much more efficient, because they allow the server to update the model with information from only one worker node without waiting for slow worker nodes in each round. However, the convergence analysis for the asynchronous distributed algorithms is much more difficult due to the inconsistent variables update across nodes. Thus, it is challenging to design efficient distributed machine learning algorithms for collaborative big data analysis. The research objective of this project is to address the computational challenges in the emerging multi-site collaborative data mining for brain big data.
This project seeks to harness the opportunities of designing new efficient asynchronous distributed machine learning algorithms with rigorous theoretical foundations for multi-site collaborative brain big data mining, creating large-scale computational strategies and effective software tools to reveal sophisticated relationships among heterogeneous brain data. This project designs the asynchronous distributed machine learning and principled big data mining models to conduct the comprehensive study of brain imaging genomics and connectomics. Specifically, the principal investigators investigate: 1) collaborative genotype and phenotype association study using new asynchronous doubly stochastic proximal gradient algorithms; 2) communication-efficient multi-site collaborative data integration models to integrate imaging genomics data for predicting outcomes of interest; 3) collaborative deep learning algorithm speedup by the asynchronous distributed algorithms with applications in temporal cognitive change prediction; and 4) new graph convolutional deep learning models for brain network mining. It is innovative to integrate new distributed machine learning and data-intensive computing with brain imaging genomics and connectomics that hold great promise for a systems biology of the brain. The developed methods and tools impact other neuroimaging, genomics, and neuroscience research, and enable investigators working on brain science to effectively test their scientific hypotheses. This project will also facilitate the development of novel educational tools.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.