Neuroscientific data contain information from an incredible diversity of species, are generated by a plethora of devices, and encapsulate the results of scientific thinking and decision making. Most of this generated data remains confined within laboratories and is not accessible to the broader scientific community. The research projects awarded under the Brain Initiative are generating a diverse collection of data that can transform and accelerate the pace of discovery. These datasets are large--ranging in size from GBs to PBs-- and represent diverse data types and assorted metadata. To integrate, rather than further isolate, these numerous efforts there is a need to archive, preserve, share, and process data in a way that is meaningful to neuroscience researchers. Any technological solution should reduce redundancy of storage and computation, allow computing near data, and provide easy, but protected when appropriate, access to researchers or citizen scientists. Given the scale of these initiatives and the range of sample sizes and data types, any solution should also consider the broad range of individual technical expertise in the community and therefore allow easy engagement with and ingestion into an archive, while supporting education and training of the scientists in using these technologies. To solve these problems, we propose ?DANDI: Distributed Archives for Neurophysiology Data Integration.?We leverage our team?s extensive experience in informatics, standards development, software engineering, community building, and leverage a robust open-source software stack to create this archive. The archive will lower barriers for neuroscientists by using the ?Neurodata Without Borders (NWB; ?http://nwb.org?) standard as a consistent data format, by providing interoperability with other standards, and by providing robust tools and convenient Web interfaces to interact with the archive. DANDI will: 1) ?provide a cloud platform for versioned neurophysiology data storage for the purposes of collaboration, archiving, and preservation. 2) ?provide easy to use tools for neurophysiology data submission and access in the archive; and 3) facilitate adoption of NWB via standardized applications for data ingestion, visualization and processing. ?We will work with local investigators, the broader neurophysiology community, and with federal and other funders to determine how long and which pieces of data will be stored in DANDI. The archive will also use state of the art data distribution technologies to increase redundancy and fault tolerance, and allow distributed computing across cloud and local computing resources. Consequently the effort will significantly reduce the barrier between laboratories and the cloud, fostering collaboration and data exchange. Overall, we aim to leverage our collective expertise to create and support an NWB-based neurophysiology archive that seamlessly integrates with and enhances current researcher workflows, lowers barriers for scientific inquiry and collaboration, and preserves information for wide reuse.
The proposal will build an easy to use infrastructure for scientists to share, collaborate, and process data from neurophysiology experiments, which form the basis for understanding cellular level mechanisms of brain function. Open data helps to increase collaboration and benefits researchers, but can also engage students in high schools and colleges. An open archive will facilitate data publishing and improved access by scientific communities and has the potential to accelerate scientific discoveries about the nervous system.