A collection of diverse and highly accurate primate genomes are critical to further our understanding of human variation and the evolutionary context of genetic disease. The main goal of this proposal is to generate high quality reference genomes that better represent the complexity of human diversity (i.e., continental human reference genomes) and that significantly improve the quality of index non-human primate (NHP) genomes, reaching a quality level more in line with the current human genome (GRCh38). We have selected 8 human genomes and 8 NHP for de novo sequencing and assembly using single molecule real-time sequencing followed by extensive higher-level resolution using experimental approaches. The end-result will be a set of NHP genomes that represent a >10-20 fold improvement in assembly continuity and representation of each and human genomes where >95% of euchromatic unique regions are fully sequenced, annotated, and phased. This project includes a special emphasis on gaps and gene-rich complex sequence structure, which have been the most intractable euchromatic regions of primate genomes. While there are many metrics of genome assembly completion and sequencing, ours is a practical one. The goal of this project is complete euchromatic sequence where >95% of the bases are ordered and oriented, and >95% of gene models are complete and annotated. Assemblies based on improved genome scaffolding or simple phasing of short read data using synthetic long reads add value but do not meet the needs of most researchers who are interested in studying gene models, gene regulation, and genetic variation. The community requires that sequence gaps are resolved and each genome is assembled at high contiguity. Our strategy is to deliver quality over quantity, and as such we are focused on a smaller subset of genomes delivered at the highest quality, building upon very recent advances in sequencing technology and assembly.
There is a continued need to develop non-human primate (NHP) models of human disease and to more fully understand genetic variation within our species. The primary goal of this resource-related research project is to replace existing NHP genome references with more complete versions that will accelerate research and increase NHP utility to the scientific community. Equally important is our plans to increase our understanding of human genetic diversity, by creating new reference genomes from humans of varied geographical origins. The complexity of human genome structural variation justifies the establishment of new reference genomes of high quality where difficult regions are resolved. This effort will more fully allow us to understand the complete spectrum of human genetic variation, its evolutionary origins and lead to increased power in identifying the genetic etiology of complex human disease.
Kronenberg, Zev N; Fiddes, Ian T; Gordon, David et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360: |
Fiddes, Ian T; Armstrong, Joel; Diekhans, Mark et al. (2018) Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 28:1029-1038 |
Kuderna, Lukas F K; Tomlinson, Chad; Hillier, LaDeana W et al. (2017) A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0). Gigascience 6:1-6 |