The Open Science Data Framework (OSDF) will provide the architecture and software necessary for conducting bioinformatics analyses in a federated cloud-enabled data and computational environment. The OSDF consists of a data store and data exchange procedures with an Application Program Interface (API) to support data submissions, retrievals and analysis for the user community. OSDF will support a diverse set of users, including (1) sequence generators that need to store and process raw data (2) tool and pipeline developers that need access to reference data sets, and (3) web-based resources that need real-time querying of reference data. The genomics community will be able to use this resource to process human genomic, transcriptomics, and metagenomic data to conduct analyses that include human variation detection, transcriptome analysis, epigenetic analysis, and microbiome analysis. To accomplish these goals we propose to: 1) establish the OSDF software stack and;2) ensure the usability of this data by integrating OSDF with established community supported pipelines in Cloud-enabled virtual machines;3) create two OSDF Instances where we will host publicly available genomic, transcriptomics, and metagenomic data from the 1000 Genomes Project, MG-RAST, and Human Microbiome Project and some of the intermediate and final analysis results;4) provide adequate documentation and training to the user community to use the system.

Public Health Relevance

With the technological innovations and improvements in genome sequencing in the past decade sequencing is becoming cheaper and will soon become an integral part of medical research and practice. However, the computational resources needed to process this sequence data have not kept pace. With the OSDF researchers will be able to share and reuse expensive analysis results thereby reducing the overall costs of conducting translational research that utilizes genomic data.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O3))
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Maryland Baltimore
Public Health & Prev Medicine
Schools of Medicine
United States
Zip Code
Lloyd-Price, Jason; Mahurkar, Anup; Rahnavard, Gholamali et al. (2017) Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550:61-66
Sinha, Rashmi; Abu-Ali, Galeb; Vogtmann, Emily et al. (2017) Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol 35:1077-1086
Blaser, Martin J; Cardon, Zoe G; Cho, Mildred K et al. (2016) Toward a Predictive Understanding of Earth's Microbiomes to Address 21st Century Challenges. MBio 7:
Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang et al. (2016) The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res 44:D590-4
Caporaso, J Gregory; Lauber, Christian L; Walters, William A et al. (2012) Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 6:1621-4