With the reference human genome sequence now completed, the next wave of large-scale sequencing will be aimed at genomes that can further inform the human sequence or otherwise provide significant value for biological discovery. These sequences must be of high quality, yet must be generated efficiently and at a substantially lower cost. In this proposal, we describe technical developments that will allow us to produce longer sequence read lengths, decrease sequencing costs, improve physical map construction, streamline genome assembly, and automate sequence finishing. To support these advances, we will develop enhanced informatics tools and infrastructure to effectively integrate and improve management of the entire range of our laboratory processes. On the basis of these technical developments, we will produce genome sequence data at a rate of 3.3M reads/month in Year 1, scaling moderately to 3.8M reads/month in Year 3. Over the same time period, we aim to increase average read length by at least 300 bp, and to cut our per-read cost from $1.35 to $0.75 or less. Refined methods and tools to more efficiently finish genome sequences to high quality and continuity standards, as well as methods and tools for detection and annotation of genes and other elements encoded within those genomes, will further enhance the output data from our Center. Coupled with advances in strategy, these improvements will substantially improve the efficiency and the economics of genome sequencing, making it much more feasible to consider the analysis of additional human and animal genomes. ? ?

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-P (O1))
Program Officer
Felsenfeld, Adam
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Washington University
Schools of Medicine
Saint Louis
United States
Zip Code
Berger, Ashton C; Korkut, Anil; Kanchi, Rupa S et al. (2018) A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers. Cancer Cell 33:690-705.e9
Hoadley, Katherine A; Yau, Christina; Hinoue, Toshinori et al. (2018) Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 173:291-304.e6
Liu, Jianfang; Lichtenberg, Tara; Hoadley, Katherine A et al. (2018) An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell 173:400-416.e11
Bailey, Matthew H; Tokheim, Collin; Porta-Pardo, Eduard et al. (2018) Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173:371-385.e18
Magrini, Vincent; Gao, Xin; Rosa, Bruce A et al. (2018) Improving eukaryotic genome annotation using single molecule mRNA sequencing. BMC Genomics 19:172
Blue, Elizabeth E; Bis, Joshua C; Dorschner, Michael O et al. (2018) Genetic Variation in Genes Underlying Diverse Dementias May Explain a Small Proportion of Cases in the Alzheimer's Disease Sequencing Project. Dement Geriatr Cogn Disord 45:1-17
Hmeljak, Julija; Sanchez-Vega, Francisco; Hoadley, Katherine A et al. (2018) Integrative Molecular Characterization of Malignant Pleural Mesothelioma. Cancer Discov 8:1548-1565
Sanchez-Vega, Francisco; Mina, Marco; Armenia, Joshua et al. (2018) Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 173:321-337.e10
Way, Gregory P; Sanchez-Vega, Francisco; La, Konnor et al. (2018) Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas. Cell Rep 23:172-180.e3
Ricketts, Christopher J; De Cubas, Aguirre A; Fan, Huihui et al. (2018) The Cancer Genome Atlas Comprehensive Molecular Characterization of Renal Cell Carcinoma. Cell Rep 23:313-326.e5

Showing the most recent 10 out of 234 publications