A collection of diverse and highly accurate primate genomes are critical to further our understanding of human variation and the evolutionary context of genetic disease. The main goal of this proposal is to generate high quality reference genomes that better represent the complexity of human diversity (i.e., continental human reference genomes) and that significantly improve the quality of index non-human primate (NHP) genomes, reaching a quality level more in line with the current human genome (GRCh38). We have selected 8 human genomes and 8 NHP for de novo sequencing and assembly using single molecule real-time sequencing followed by extensive higher-level resolution using experimental approaches. The end-result will be a set of NHP genomes that represent a >10-20 fold improvement in assembly continuity and representation of each and human genomes where >95% of euchromatic unique regions are fully sequenced, annotated, and phased. This project includes a special emphasis on gaps and gene-rich complex sequence structure, which have been the most intractable euchromatic regions of primate genomes. While there are many metrics of genome assembly completion and sequencing, ours is a practical one. The goal of this project is complete euchromatic sequence where >95% of the bases are ordered and oriented, and >95% of gene models are complete and annotated. Assemblies based on improved genome scaffolding or simple phasing of short read data using synthetic long reads add value but do not meet the needs of most researchers who are interested in studying gene models, gene regulation, and genetic variation. The community requires that sequence gaps are resolved and each genome is assembled at high contiguity. Our strategy is to deliver quality over quantity, and as such we are focused on a smaller subset of genomes delivered at the highest quality, building upon very recent advances in sequencing technology and assembly.

Public Health Relevance

There is a continued need to develop non-human primate (NHP) models of human disease and to more fully understand genetic variation within our species. The primary goal of this resource-related research project is to replace existing NHP genome references with more complete versions that will accelerate research and increase NHP utility to the scientific community. Equally important is our plans to increase our understanding of human genetic diversity, by creating new reference genomes from humans of varied geographical origins. The complexity of human genome structural variation justifies the establishment of new reference genomes of high quality where difficult regions are resolved. This effort will more fully allow us to understand the complete spectrum of human genetic variation, its evolutionary origins and lead to increased power in identifying the genetic etiology of complex human disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24HG009081-03
Application #
9481312
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Felsenfeld, Adam
Project Start
2016-05-03
Project End
2019-04-30
Budget Start
2018-05-01
Budget End
2019-04-30
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Fiddes, Ian T; Armstrong, Joel; Diekhans, Mark et al. (2018) Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 28:1029-1038
Kronenberg, Zev N; Fiddes, Ian T; Gordon, David et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360:
Kuderna, Lukas F K; Tomlinson, Chad; Hillier, LaDeana W et al. (2017) A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0). Gigascience 6:1-6