Data commons for 'omic, microscopy and analysis core facilities

Wray, Gregory

Abstract

We propose a """"""""data commons"""""""" to provide data storage for seven core facilities that produce or analyze large and complex datasets for biomedical researchers at Duke University and their collaborators worldwide. With data storage requirements already counting in hundreds of terabytes the core facilities continue to adopt new technologies that will further increase the flows of data by orders of magnitude, a looming challenge for data retention and analysis. Our proposal seeks to turn this excess of data from a challenge to strength for life sciences researchers. Equipment purchased with the NIH SIG grant will use a mixture of disc arrays (~450 usable terabytes) and tape library with initial uncompressed data capacity of 1 petabyte to create a scalable data storage resource with features that protect and preserve data and manage data efficiently. Disc arrays will be configured to serve up data to existing computational servers for analysis either by researchers or by staff in Duke's """"""""Omics Analysis Core Facility,"""""""" which was created in 2012 to provide support to researchers unfamiliar with using genomic data sets. The disc arrays and tape library will function in a coordinated fashion in a Quantum """"""""Stornext"""""""" file system, and data management policies and system usage data collection will allow for fine-grained optimization of system performance and economy. The Quantum systems have been widely deployed in data-intensive industry and research, including in the genome sciences. From the outset, the core facilities will use the equipment to enhance their integration. Data provenance and management features of the Duke's Express Data Repository - used by the proteomics and microarray core facilities since 2006 - will be attached to the proposed storage equipment so that services can be expanded to include sequence data, RNAi screening data, microscopy images, and results from the analysis core. Thus, the proposed data commons will function in a manner allowing for efficient and, in many cases, automatic data hand-offs within a protected storage framework. This integration removes logistical impediments for data integration, reduces chances for accidental (or malicious) data corruption, and enhances the scope of automation for large-scale research projects. The storage will be connected via the cores to data-producing equipment and to computational servers run by the 'omics analysis core and to storage on a high-performance analysis cluster. All IT assets, including the proposed equipment, are administered by professional IT staff with particular strength in infrastructure required for large-scale research in the genome sciences. While increase in efficiency of data management will be an immediate benefit, the more important goal of the project is to enable researchers to speed their use of integrated and complex data to explore the complexity of human health and disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Office of The Director, National Institutes of Health (OD)
Type: Biomedical Research Support Shared Instrumentation Grants (S10)
Project #: 1S10OD018164-01
Application #: 8640394
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Levy, Abraham

Project Start: 2014-07-18
Project End: 2015-07-17
Budget Start: 2014-07-18
Budget End: 2015-07-17
Support Year: 1
Fiscal Year: 2014
Total Cost
Indirect Cost

Institution

Name: Duke University
Department
Type: Schools of Medicine
DUNS #

City: Durham
State: NC
Country: United States
Zip Code: 27705

Publications

Wiehe, Kevin; Bradley, Todd; Meyerhoff, R Ryan et al. (2018) Functional Relevance of Improbable Antibody Mutations for HIV Broadly Neutralizing Antibody Development. Cell Host Microbe 23:759-765.e6

Abaffy, Tatjana; Bain, James R; Muehlbauer, Michael J et al. (2018) A Testosterone Metabolite 19-Hydroxyandrostenedione Induces Neuroendocrine Trans-Differentiation of Prostate Cancer Cells via an Ectopic Olfactory Receptor. Front Oncol 8:162

Ray, Thomas A; Roy, Suva; Kozlowski, Christopher et al. (2018) Formation of retinal direction-selective circuitry initiated by starburst amacrine cell homotypic contact. Elife 7:

Rouze, Ned C; Deng, Yufeng; Trutna, Courtney A et al. (2018) Characterization of Viscoelastic Materials Using Group Shear Wave Speeds. IEEE Trans Ultrason Ferroelectr Freq Control 65:780-794

Khaled, Mariam Lofty; Bykhovskaya, Yelena; Yablonski, Sarah E R et al. (2018) Differential Expression of Coding and Long Noncoding RNAs in Keratoconus-Affected Corneas. Invest Ophthalmol Vis Sci 59:2717-2728

Wertz, J; Caspi, A; Belsky, D W et al. (2018) Genetics and Crime: Integrating New Genomic Discoveries Into Psychological Research About Antisocial Behavior. Psychol Sci 29:791-803

Gao, Xia; Lee, Katie; Reid, Michael A et al. (2018) Serine Availability Influences Mitochondrial Dynamics and Function through Lipid Metabolism. Cell Rep 22:3507-3520

Sucheston-Campbell, Lara E; Clay-Gilmour, Alyssa I; Barlow, William E et al. (2018) Genome-wide meta-analyses identifies novel taxane-induced peripheral neuropathy-associated loci. Pharmacogenet Genomics 28:49-55

Carnes, Megan Ulmer; Allingham, R Rand; Ashley-Koch, Allison et al. (2018) Transcriptome analysis of adult and fetal trabecular meshwork, cornea, and ciliary body tissues by RNA sequencing. Exp Eye Res 167:91-99

Posfai, Dora; Eubanks, Amber L; Keim, Allison I et al. (2018) Identification of Hsp90 Inhibitors with Anti-Plasmodium Activity. Antimicrob Agents Chemother 62:

Showing the most recent 10 out of 154 publications

Comments

Be the first to comment on Gregory Wray's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Publications

Comments