The brain imaging community is greatly benefiting from extensive data sharing efforts currently underway5,10. However, there is a significant gap in existing strategies which focus on anonymized, post-hoc sharing of either 1) full raw or preprocessed data [in the case of open studies] or 2) manually computed summary measures [such as hippocampal volume11, in the case of closed (or not yet shared) studies] which we propose to address. Current approaches to data sharing often include significant logistical hurdles both for the investigator sharing the dat as well as for the individual requesting the data (e.g. often times multiple data sharing agreements and approvals are required from US and international institutions). This needs to change, so that the scientific community becomes a venue where data can be collected, managed, widely shared and analyzed while also opening up access to the (many) data sets which are not currently available (see recent overview on this from our group2). The large amount of existing data requires an approach that can analyze data in a distributed way while also leaving control of the source data with the individual investigator; this motivates dynamic, decentralized way of approaching large scale analyses. We are proposing a peer-to-peer system called the Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation (COINSTAC). The system will provide an independent, open, no-strings-attached tool that performs analysis on datasets distributed across different locations. Thus, the step of actually aggregating data can be avoided, while the strength of large-scale analyses can be retained. To achieve this, in Aim 1, the uniform data interfaces that we propose will make it easy to share and cooperate. Robust and novel quality assurance and replicability tools will also be incorporated. Collaboration and data sharing will be done through forming temporary (need and project-based) virtual clusters of studies performing automatically generated local computation on their respective data and aggregating statistics in global inference procedures. The communal organization will provide a continuous stream of large scale projects that can be formed and completed without the need of creating new rigid organizations or project-oriented storage vaults.
In Aim 2, we develop, evaluate, and incorporate privacy-preserving algorithms to ensure that the data used are not re-identifiable even with multiple re-uses. We also will develop advanced distributed and privacy preserving approaches for several key multivariate families of algorithms (general linear model, matrix factorization [e.g. independent component analysis], classification) to estimate intrinsic networks and perform data fusion. Finally, in Aim 3, we will demonstrate the utility of this approach in a proof of concept study through distributed analyses of substance abuse datasets across national and international venues with multiple imaging modalities.
Hundreds of millions of dollars have been spent to collect human neuroimaging data for clinical and research purposes, many of which don't have data sharing agreements or collect sensitive data which are not easily shared, such as genetics. Opportunities for large scale aggregated analyses to infer health-relevant facts create new challenges in protecting the privacy of individuals' data. Open sharing of raw data, though desirable from the research perspective, and growing rapidly, is not a good solution for a large number of datasets which have additional privacy risks or IRB concerns. The COINSTAC solution we are proposing will capture this 'missing data' and allow for pooling of both open and 'closed' repositories by developing privacy preserving versions of widely-used algorithms and incorporating within an easy-to-use platform which enables distributed computation. In addition, COINSTAC will accelerate research on both open and closed data by offering a distributed computational solution for a large toolkit of widely used algorithms.
Showing the most recent 10 out of 28 publications