Unravelling the genetic basis of human health and disease requires high-quality genome information. Heretofore, the community has relied on a single, haploid human reference sequence that does not represent the actual genome sequence of any single person. While this resource has been enormously beneficial for mapping many of the genetic variants responsible for human normal and disease variation, it can limit many types of analyses. Further, this single-reference model can cause uneven power to analyze genetic variation across all human populations. Similarly, non-human primate genomes are also fundamentally important for understanding our own genome. While many draft primate genome assemblies are available, including for all great apes, these genomes are all of lower quality and contiguity than the human reference genome. Importantly, chromosome-scale scaffolding of these genomes was often done by comparison to the human reference. While this approximation is generally correct, knowing where this is wrong is critically important. Using a radically innovative and simple approach, we can now generate highly contiguous de novo assemblies of human and non-human primate genomes. The approach requires sub- microgram quantities of DNA and can be carried done from start to finish within a few months, including sequencing time. Our approach uses genome contiguity information as derived from proximity ligation of in vitro assembled chromatin. It harnesses the speed and cost-effectiveness of high-throughput sequencing to generate large amounts of haplotype-phased contiguity data spanning well over 100 kilobases in length. Using this approach we will generate de novo assembled genomes from 50 humans and 12 non-human primates of high accuracy, partially haplotype phased, with scaffold N50s expected to be between 10 and 20 Mb in length.

Public Health Relevance

The human genome sequence is a critically important resource for biomedical discovery. Using an innovative approach we will generate genome data from humans and non-human primates that will augment and extent our knowledge of human genome variation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
1U24HG009084-01
Application #
9132581
Study Section
Special Emphasis Panel (ZHG1-HGR-L (J2))
Program Officer
Felsenfeld, Adam
Project Start
2016-06-17
Project End
2019-04-30
Budget Start
2016-06-17
Budget End
2017-04-30
Support Year
1
Fiscal Year
2016
Total Cost
$356,869
Indirect Cost
$119,895
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064
Lazar, Nathan H; Nevonen, Kimberly A; O'Connell, Brendan et al. (2018) Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res 28:983-997
Kuderna, Lukas F K; Tomlinson, Chad; Hillier, LaDeana W et al. (2017) A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0). Gigascience 6:1-6