Inter- and Intra-Species Comparative Sequencing

Brody, Lawrence

Abstract

The NIH Intramural Sequencing Center (NISC) has just completed 14 years of successful operation. During that time, we have generated over 57 million Sanger dideoxy-type DNA sequence reads and performed quality sequence finishing of nearly 13,000 genomic bacterial artificial chromosomes (BAC) clones. Over the past four years, NISC has carefully adopted and put into production two of the current NextGen DNA sequencing technologies, Illumina and Roche/454. Using these platforms, we have generated about 160 billion reads in the past year alone. Though we remain consistently at a level of a mid-scale genome sequencing center, we have maintained advantageous economies of scale while remaining relatively agile. While NISC has undertaken projects of many sizes and types throughout the years, from ESTs to SAGE sequencing, the NISC Comparative Sequencing Program has been the most productive over-arching success, beginning with the sequencing of mouse BACs orthologous to human chromosome 7 at the start of the mouse genome project and extending to over 70 species across numerous targets, including the flagship CFTR target that encompasses 1 MB of human chromosome 7. This BAC-based sequencing approach found great utility in scouting new genomes and for specialized targeting of complex genomic regions containing duplications and structural rearrangements that made them intractable by traditional genomic sequencing approaches. Recent advances in genomic library construction, sequencing chemistries and computational assembly programs have begun to reduce the need for the highly effective, but relatively expensive Sanger sequencing of BAC shotgun libraries. NISC has been one of the few remaining sequencing groups to keep this method in production, but is currently preparing to finish BACs in progress and shutdown this production pipeline by the end of 2011. In keeping with the Comparative Sequencing interests, several years ago NISC implemented an amplicon-based Sanger sequencing pipeline designed to focus on intra-species variation. Numerous clinically relevant projects were designed to amplify and sequence specific genes and regions of interest in small groups of human subjects, yielding great insights into disease related genotype/phenotype combinations. The flagship ClinSeq Project greatly advanced the study of atherosclerosis by providing sequence data for 250 genes in over 500 volunteers. While this approach was extremely productive, the combination of large volumes of high quality sequence data generated by the Illumina platform, along with efficient whole exome genomic enrichment techniques evaluated and adopted by NISC has allowed us to transition to an even more cost-effective approach that provides an increasingly comprehensive data set. As a consequence of these advances, NISC no longer offers Sanger-based amplicon targeted sequencing in production mode. The adoption of many new sequencing protocols in production created the commensurate need for dramatic changes to sample tracking, flow control and primary analysis pipelines. Rapid design, development and implementation of new Laboratory Information Management System (LIMS) by a dedicated team has met the initial challenges and continues to evolve quickly to adapt to a continuous flow of changes. A combination of talented IT staff and bioinformaticians have met the challenges of extremely large and complex data sets by implementing and continuously adapting pipeline programs to support rapidly evolving software associated with each of the sequencing platforms. Beyond primary analysis that results in DNA basecalls and quality scores, NISC has worked closely with members of other NHGRI research groups to implement and support high-throughput production of biologically relevant secondary analysis. One shining example of these efforts is the production scale processing of Whole Exome Sequencing (WES) data to all of our clients, the end product of which is distilled sets of variants of interest that are accessible in user-friendly fashion by the use of the in-house developed VarSifter program. The success of these programs has lead to an increasing number of projects large and small from a growing number of investigators. The implementation of improved project management tools is helping to address the challenges associated with such growth. In the foreseeable future, NISC plans to provide next-gen sequence data for several large, multi-year projects, including ClinSeq, Skin Microbiome Project, and Mouse Methylome Project, a recently initiated collaboration with NIEHS. Our focus is to increase operational efficiencies of the next-gen pipeline, refine existing protocols, implement additional protocols as new sample/experimental types are requested from researchers and continue to expand the value added data analysis packages available. Furthermore, we will continue to monitor developments in the rapidly evolving sequencing and informatics technologies, implementing those we deem most appropriate for the sequence data we produce for collaborating investigators.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Production Facilities Intramural Research (ZIB)
Project #: 1ZIBHG000196-11
Application #: 8350093
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 11
Fiscal Year: 2011
Total Cost: $11,776,841
Indirect Cost

Institution

Name: National Human Genome Research Institute
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Le Gallo, Matthieu; Rudd, Meghan L; Urick, Mary Ellen et al. (2018) The FOXA2 transcription factor is frequently somatically mutated in uterine carcinosarcomas and carcinomas. Cancer 124:65-73

Duncan, Christopher G; Grimm, Sara A; Morgan, Daniel L et al. (2018) Dosage compensation and DNA methylation landscape of the X chromosome in mouse liver. Sci Rep 8:10138

Zhou, Tongqing; Zheng, Anqi; Baxa, Ulrich et al. (2018) A Neutralizing Antibody Recognizing Primarily N-Linked Glycan Targets the Silent Face of the HIV Envelope. Immunity 48:500-513.e6

Weingarten, Rebecca A; Johnson, Ryan C; Conlan, Sean et al. (2018) Genomic Analysis of Hospital Plumbing Reveals Diverse Reservoir of Bacterial Plasmids Conferring Carbapenem Resistance. MBio 9:

Strongin, Anna; Heller, Theo; Doherty, Dan et al. (2018) Characteristics of Liver Disease in 100 Individuals With Joubert Syndrome Prospectively Evaluated at a Single Center. J Pediatr Gastroenterol Nutr 66:428-435

Roessler, Erich; Hu, Ping; Marino, Juliana et al. (2018) Common genetic causes of holoprosencephaly are limited to a small set of evolutionarily conserved driver genes of midline development coordinated by TGF-?, hedgehog, and FGF signaling. Hum Mutat 39:1416-1427

Gourh, Pravitt; Remmers, Elaine F; Boyden, Steven E et al. (2018) Brief Report: Whole-Exome Sequencing to Identify Rare Variants and Gene Networks That Increase Susceptibility to Scleroderma in African Americans. Arthritis Rheumatol 70:1654-1660

Randall, Thomas A; Mullikin, James C; Mueller, Geoffrey A (2018) The Draft Genome Assembly of Dermatophagoides pteronyssinus Supports Identification of Novel Allergen Isoforms in Dermatophagoides Species. Int Arch Allergy Immunol 175:136-146

Kimble, Danielle C; Lach, Francis P; Gregg, Siobhan Q et al. (2018) A comprehensive approach to identification of pathogenic FANCA variants in Fanconi anemia patients and their families. Hum Mutat 39:237-254

Harris, Melissa L; Fufa, Temesgen D; Palmer, Joseph W et al. (2018) A direct link between MITF, innate immunity, and hair graying. PLoS Biol 16:e2003648

Showing the most recent 10 out of 209 publications

Comments

Be the first to comment on Lawrence Brody's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: