A large number of APIs, software, and web interfaces will be made available to the informatics and research communities through the engineering efforts of individual teams spread across several institutions as part of the proposed Center for Big Data in Translational Genomics. A well-designed training and outreach strategy will provide both software users and developers with the knowledge they need to effectively integrate and employ these resources in their work. The Center will provide online documentation, help pages, FAQs, text and video tutorials, and usage scenarios for the tools and APIs developed. Through in-person training seminars, hands-on workshops, and short courses, biomedical researchers and data scientists will learn about the tools developed by the Center. Developers will be provided online assistance to the Center's APIs and tools through an interactive mailing list and via source code management tools that allow the open tracking of issues and developer queries. Finally, within the Center itself, training activities for pre-doctoral and post-doctoral students will be provided to a new and diverse generation of data scientists so that they may understand, use and enhance the Center's tools.

Public Health Relevance

The Center's training and outreach strategy will provide a broad spectrum of software users and developers with the knowledge they need to effectively integrate and employ these resources in their work. This will train data scientists to bring genomics into the big data era to use genomics to understand disease and develop new precision treatments.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-R (52))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Santa Cruz
United States
Zip Code
Toor, Jugmohit S; Rao, Arjun A; McShan, Andrew C et al. (2018) A Recurrent Mutation in Anaplastic Lymphoma Kinase with Distinct Neoepitope Conformations. Front Immunol 9:99
Kronenberg, Zev N; Fiddes, Ian T; Gordon, David et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360:
Jain, Miten; Olsen, Hugh E; Turner, Daniel J et al. (2018) Linear assembly of a human centromere on the Y chromosome. Nat Biotechnol 36:321-323
Garrison, Erik; Sirén, Jouni; Novak, Adam M et al. (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 36:875-879
Ellrott, Kyle; Bailey, Matthew H; Saksena, Gordon et al. (2018) Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. Cell Syst 6:271-281.e7
Fiddes, Ian T; Armstrong, Joel; Diekhans, Mark et al. (2018) Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 28:1029-1038
Paten, Benedict; Eizenga, Jordan M; Rosen, Yohei M et al. (2018) Superbubbles, Ultrabubbles, and Cacti. J Comput Biol 25:649-663
Tyson, John R; O'Neil, Nigel J; Jain, Miten et al. (2018) MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome. Genome Res 28:266-274
Jain, Miten; Koren, Sergey; Miga, Karen H et al. (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338-345
Computational Pan-Genomics Consortium (2018) Computational pan-genomics: status, promises and challenges. Brief Bioinform 19:118-135

Showing the most recent 10 out of 76 publications