The Center for Big Data in Translational Genomics is a multi-institution partnership coordinated by the University of California at Santa Cruz to create scalable infrastructure for the broad application of genomics in biomedicine. Our U.S. partners include UC San Francisco, UC Berkeley, Oregon Health Science University, Caltech, and several major big data companies. International partners include the European Bioinformatics Institute, the Sanger Centre, the Ontario Institute for Cancer Research and a computer systems provider. The Center will make software solutions interoperable through the development of standard Application Programming Interfaces (APIs) and tools at multiple levels, from raw sequence data to genetic variation and functional data, through to systems, pathways and phenotypes. The overriding goal is to create implementations capable of handling genomics datasets that are orders of magnitude larger than those that can now be handled. The APIs and all academic reference implementations will be open source, while several major corporate partners not funded by the project will provide proprietary implementations, creating a competitive ecosystem of interoperable big data genomics software. All-comers extensive benchmarking will be performed on all implementations within and external to our center to identify best-of-breed and results made broadly available. Design will be in part driven by the needs of a diverse set of separately funded specific biomedical projects that will serve as pilots. These include the Pan-Cancer whole genome analysis project of the International Cancer Genomics Consortium to analyze 2,000 cancer genomes, the UK10K project to analyze 10,000 personal genomes, the UCSF-led I-SPY2 adaptive breast cancer trial, and the omics-guided leukemia therapy project BeatAML at Oregon Health Sciences University.

Public Health Relevance

At least half of all diseases have a substantial genomic component, often including contributions from the millions of individually rare but collectively common genetic variations. Only by studying the genomes and transcriptomes of very large numbers of individuals will scientists have the statistical power to discover and understand this vital aspect of the genomic contribution to disease. For this it is essential that genomics is brought into the big data era, so that analyses of huge datasets is possible and precision diagnosis and treatment based on genomic information is widely deployed.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-R (52)R)
Program Officer
Di Francesco, Valentina
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Engineering (All Types)
Schools of Engineering
Santa Cruz
United States
Zip Code
Rice, Edward S; Kohno, Satomi; John, John St et al. (2017) Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Res 27:686-696
Vivian, John; Rao, Arjun Arkal; Nothaft, Frank Austin et al. (2017) Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol 35:314-316
Rathnam, Chandramouli; Lee, Sanghoon; Jiang, Xia (2017) An algorithm for direct causal learning of influences on patient outcomes. Artif Intell Med 75:1-15
Nikolova, Olga; Moser, Russell; Kemp, Christopher et al. (2017) Modeling gene-wise dependencies improves the identification of drug response biomarkers in cancer studies. Bioinformatics 33:1362-1369
Rand, Arthur C; Jain, Miten; Eizenga, Jordan M et al. (2017) Mapping DNA methylation with high-throughput nanopore sequencing. Nat Methods 14:411-413
Tyner, Cath; Barber, Galt P; Casper, Jonathan et al. (2017) The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 45:D626-D634
Lincoln, Stephen E; Yang, Shan; Cline, Melissa S et al. (2017) Consistency of BRCA1 and BRCA2 Variant Classifications Among Clinical Diagnostic Laboratories. JCO Precis Oncol 1:
Ding, Michael Q; Chen, Lujia; Cooper, Gregory F et al. (2017) Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics. Mol Cancer Res :
Kurtz, Stephen E; Eide, Christopher A; Kaempf, Andy et al. (2017) Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. Proc Natl Acad Sci U S A 114:E7554-E7563
Yang, Shan; Cline, Melissa; Zhang, Can et al. (2017) DATA SHARING AND REPRODUCIBLE CLINICAL GENETIC TESTING: SUCCESSES AND CHALLENGES. Pac Symp Biocomput 22:166-176

Showing the most recent 10 out of 55 publications