Center for Big Data in Translational Genomics

Haussler, David; Van 'T Veer, Laura

Abstract

The Center for Big Data in Translational Genomics is a multi-institution partnership coordinated by the University of California at Santa Cruz to create scalable infrastructure for the broad application of genomics in biomedicine. Our U.S. partners include UC San Francisco, UC Berkeley, Oregon Health Science University, Caltech, and several major big data companies. International partners include the European Bioinformatics Institute, the Sanger Centre, the Ontario Institute for Cancer Research and a computer systems provider. The Center will make software solutions interoperable through the development of standard Application Programming Interfaces (APIs) and tools at multiple levels, from raw sequence data to genetic variation and functional data, through to systems, pathways and phenotypes. The overriding goal is to create implementations capable of handling genomics datasets that are orders of magnitude larger than those that can now be handled. The APIs and all academic reference implementations will be open source, while several major corporate partners not funded by the project will provide proprietary implementations, creating a competitive ecosystem of interoperable big data genomics software. All-comers extensive benchmarking will be performed on all implementations within and external to our center to identify best-of-breed and results made broadly available. Design will be in part driven by the needs of a diverse set of separately funded specific biomedical projects that will serve as pilots. These include the Pan-Cancer whole genome analysis project of the International Cancer Genomics Consortium to analyze 2,000 cancer genomes, the UK10K project to analyze 10,000 personal genomes, the UCSF-led I-SPY2 adaptive breast cancer trial, and the omics-guided leukemia therapy project BeatAML at Oregon Health Sciences University.

Public Health Relevance

At least half of all diseases have a substantial genomic component, often including contributions from the millions of individually rare but collectively common genetic variations. Only by studying the genomes and transcriptomes of very large numbers of individuals will scientists have the statistical power to discover and understand this vital aspect of the genomic contribution to disease. For this it is essential that genomics is brought into the big data era, so that analyses of huge datasets is possible and precision diagnosis and treatment based on genomic information is widely deployed.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Specialized Center--Cooperative Agreements (U54)
Project #: 5U54HG007990-05
Application #: 9492718
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Di Francesco, Valentina

Project Start: 2014-09-29
Project End: 2019-08-31
Budget Start: 2018-05-01
Budget End: 2019-08-31
Support Year: 5
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of California Santa Cruz
Department: Engineering (All Types)
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 125084723

City: Santa Cruz
State: CA
Country: United States
Zip Code: 95064

Related projects

Publications

Kronenberg, Zev N; Fiddes, Ian T; Gordon, David et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360:

Jain, Miten; Olsen, Hugh E; Turner, Daniel J et al. (2018) Linear assembly of a human centromere on the Y chromosome. Nat Biotechnol 36:321-323

Garrison, Erik; Sirén, Jouni; Novak, Adam M et al. (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 36:875-879

Ellrott, Kyle; Bailey, Matthew H; Saksena, Gordon et al. (2018) Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. Cell Syst 6:271-281.e7

Fiddes, Ian T; Armstrong, Joel; Diekhans, Mark et al. (2018) Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 28:1029-1038

Paten, Benedict; Eizenga, Jordan M; Rosen, Yohei M et al. (2018) Superbubbles, Ultrabubbles, and Cacti. J Comput Biol 25:649-663

Tyson, John R; O'Neil, Nigel J; Jain, Miten et al. (2018) MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome. Genome Res 28:266-274

Jain, Miten; Koren, Sergey; Miga, Karen H et al. (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338-345

Computational Pan-Genomics Consortium (2018) Computational pan-genomics: status, promises and challenges. Brief Bioinform 19:118-135

Kolmogorov, Mikhail; Armstrong, Joel; Raney, Brian J et al. (2018) Chromosome assembly of large and complex genomes using multiple references. Genome Res 28:1720-1732

Showing the most recent 10 out of 76 publications

Comments

Be the first to comment on David Haussler's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: