NIH investigators are gathering terabyte- to petabyte-scale datasets generated using state-of-the-art biomedical technologies. To properly manage these large datasets, researchers need to keep them secure from unauthorized access and from data loss, while still being able to share the data with collaborators, and maintain these best practices over long periods of time. Although Stanford investigators have access to petabyte-scale High Performance Storage (HPS) systems, these systems are designed for high performance computation and not long-term protected storage, and they do not have granular secure file sharing. We request funding for a storage appliance known as an object store that will provide a robust mechanism for long-term protected data storage and secure data sharing. Additionally, our proposed storage appliance will be usable by multiple researchers simultaneously and be simple for users and administrators to manage. ?Need for data sharing?: Data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health, so researchers are under pressure from a variety of institutions to make their data available. This appliance will enable investigators to put up a lightweight webpage for data-sharing (e.g., laboratory website, wiki, or github) while storing the bulk of the data on this appliance. ?Need for data security?: Raw biomedical data are often shared with strict security and privacy requirements. Our proposed appliance supports file access management by supporting authentication, authorization, and access controls with Stanford?s user database. Its fine-grained approach to authentication will allow files to be shared with limited sets of users, which is valuable when exchanging data with collaborators. ?Need for data protection?: The traditional RAID scheme for data protection used by HPS systems to provide high throughput is not resilient enough for long-term data protection in the age of high capacity disks (8TB/hard drive). Our proposed object storage uses an erasure-coding scheme, a method that provides better availability at lower overhead and cost when compared with RAID. An appliance like this object store will make a fundamental impact to 100s of researchers at Stanford by providing them with a flexible and cost-effective resource for long-term protected data storage and secure file sharing.

Public Health Relevance

The requested object store will be used to house the terabyte- and petabyte-scale data that is increasingly being produced and analyzed as part of many biomedical investigations. This device will allow researchers to share their data securely using fine-grained access control, and be able to store metadata in the same objects as their data. These capabilities will greatly expand the ability of investigators to curate and transmit their data to the widest possible scientific audience.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Type
Biomedical Research Support Shared Instrumentation Grants (S10)
Project #
1S10OD025082-01
Application #
9494292
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Horska, Alena
Project Start
2018-04-15
Project End
2019-04-14
Budget Start
2018-04-15
Budget End
2019-04-14
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304