The Center for Big Data in Translational Genomics will pioneer common Application Programming Interfaces (APIs) for big genomics data in biomedicine. This will involve multiple groups in academia, medicine and industry, and interactions with a recently formed global alliance for responsible sharing of genomic and clinical data. The Center will create reference implementations that will drive API adoption, and that by coordination with industry will be readily deployable in the broadest range of commercial clouds, including those of Amazon, Google and Microsoft, as well as within private clouds. Along with APIs that drive standards, the Center will create a continuously operating benchmarking platform for methods of large-scale genomics analysis for worldwide use. This will establish the best-of-breed methods, and force collective improvement across big data genomics. The APIs and benchmarking efforts will create a rich infrastructure for genomics software developers. To make these underlying computational methods available to the wider biomedical community, the Center will develop large-scale genomics analysis tools on top of the big genomics data APIs, including tools for read mapping, variant analysis, transcript analysis, pathway analysis, and interactive data visualization, allowing researchers to routinely tackle data sets orders of magnitude larger than is currently possible. To ensure that the APIs and tools are developed and adapted over the course of the project to address the current and continually growing needs of biomedicine, the Center will pilot the APIs and tools in the context of a variety of driving projects, including the UK10K project in the area of population genetics and disease association research, the ICGC Pan Cancer analysis of 2,000 tumours in the area of large-scale cancer genomics, the I-SPY2 Breast cancer trial in the area of clinical trials, and the BeatAML omics-guided leukemia project in the area of clinical practice. This set of projects collectively represent some two petabytes of raw data and encompass a variety of uses of genomics in biomedicine, ensuring the software developed will be applicable to the broadest range of problems.

Public Health Relevance

At present, most genomics data is locked up in medical center silos, each individually developing their own data representations and analysis methods. Without cross-center data exchange standardization and collective benchmarking of computational procedures for accuracy and efficiency, medical genomics will become locked into inadequate and incompatible legacy approaches. An open, international, competitive, modern software development approach to data sharing, benchmarking and new computational tool development is needed, driven by leading biomedical projects to ensure it addresses the most pressing

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-R (52))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Santa Cruz
United States
Zip Code
Rice, Edward S; Kohno, Satomi; John, John St et al. (2017) Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Res 27:686-696
Vivian, John; Rao, Arjun Arkal; Nothaft, Frank Austin et al. (2017) Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol 35:314-316
Rathnam, Chandramouli; Lee, Sanghoon; Jiang, Xia (2017) An algorithm for direct causal learning of influences on patient outcomes. Artif Intell Med 75:1-15
Nikolova, Olga; Moser, Russell; Kemp, Christopher et al. (2017) Modeling gene-wise dependencies improves the identification of drug response biomarkers in cancer studies. Bioinformatics 33:1362-1369
Rand, Arthur C; Jain, Miten; Eizenga, Jordan M et al. (2017) Mapping DNA methylation with high-throughput nanopore sequencing. Nat Methods 14:411-413
Tyner, Cath; Barber, Galt P; Casper, Jonathan et al. (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45:D626-D634
Lincoln, Stephen E; Yang, Shan; Cline, Melissa S et al. (2017) Consistency of BRCA1 and BRCA2 Variant Classifications Among Clinical Diagnostic Laboratories. JCO Precis Oncol 1:
Ding, Michael Q; Chen, Lujia; Cooper, Gregory F et al. (2017) Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics. Mol Cancer Res :
Kurtz, Stephen E; Eide, Christopher A; Kaempf, Andy et al. (2017) Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. Proc Natl Acad Sci U S A 114:E7554-E7563
Yang, Shan; Cline, Melissa; Zhang, Can et al. (2017) DATA SHARING AND REPRODUCIBLE CLINICAL GENETIC TESTING: SUCCESSES AND CHALLENGES. Pac Symp Biocomput 22:166-176

Showing the most recent 10 out of 55 publications