The human reference genome is part of the foundation of modern human biology, providing a singular coordinate system critical to interpret associations between genotype and phenotype in the emerging field of genomic medicine. However, the current primary reference assembly is largely derived from one individual, and thus is incapable of depicting the full extent of sequence variation observed in the human population. This failure of representation is responsible for reference allele bias in mapping, which is worse for some genetic subpopulations, and occasionally leads to accompanying misinterpretation of gene structure and function. To address this it is imperative that we broaden our reference to include haplotype diversity observed within a cohort of high-quality, phased genome assemblies obtained from distinct and diverse subpopulations. Such a cohort would both capture common variation and haplotype structure, both of which are proving to be crucial context for interpretation. To address this need, we aim to determine a scalable, sequencing/assembly protocol using an optimized collection of new, emerging sequencing technologies that were not readily accessible in the initial phase of this project (e.g.Oxford Nanopore and improved linked Illumina reads strategies). Based upon current projections, we conservatively anticipate being able to reduce the current cost of approximately $30,000 per high quality de novo human genome to $15,000, or less, whilst achieving the same or better quality than the best contemporary assemblies. This advance in throughput and production cost, with an emphasis on maintaining or improving quality, will be critical to increase the scale and scope of genome assembly, with the ultimate goal of applying this work to include a large cohort of diverse individuals in the next reference map.

Public Health Relevance

Expanding the human reference to include a large panel of individuals from diverse subpopulations is expected to provide the bedrock for modern human genetic variant analyses. Here we propose to test new, third generation sequence technologies with the goal of generating high quality reference genomes at low production time and cost.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
3U24HG009084-03S1
Application #
9694414
Study Section
Program Officer
Felsenfeld, Adam
Project Start
2018-06-01
Project End
2019-04-30
Budget Start
2018-08-13
Budget End
2019-04-30
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064
Lazar, Nathan H; Nevonen, Kimberly A; O'Connell, Brendan et al. (2018) Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res 28:983-997
Kuderna, Lukas F K; Tomlinson, Chad; Hillier, LaDeana W et al. (2017) A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0). Gigascience 6:1-6